calculator node for pose based action recognition - mediapipe

I want to add a action_reconition calculator node to the pose_landmark detector (pose_landmark_gpu.pbtxt). Does anyone know if there is already a calculator implementation suited for that purpose?
i.e.
Input: pose landmarks
Inference via tflite model
Output: probability values for the respective action classes
I've seen that the original pose landmark detector uses tensors_to_landmarks_calculator.cc. I would need a similar file but for different input & output types. Any idea if there is a "template" cc file that I could adapt to my use case?
Just for better understanding, here is my edited pbtxt of the pose_landmark detector with an additional node for action classification:
# GPU buffer. (GpuBuffer)
input_stream: "input_video"
output_stream: "output_video" # Output image with rendered results. (GpuBuffer)
output_stream: "pose_landmarks" # Pose landmarks. (NormalizedLandmarkList)
output_stream: "action_detection" # Action Probabilities
node {
calculator: "FlowLimiterCalculator"
input_stream: "input_video"
input_stream: "FINISHED:output_video"
input_stream_info: {
tag_index: "FINISHED"
back_edge: true
}
output_stream: "throttled_input_video"
}
# Subgraph that detects poses and corresponding landmarks.
node {
calculator: "PoseLandmarkGpu"
input_stream: "IMAGE:throttled_input_video"
output_stream: "LANDMARKS:pose_landmarks"
output_stream: "DETECTION:pose_detection"
output_stream: "ROI_FROM_LANDMARKS:roi_from_landmarks"
}
# Subgraph that renders pose-landmark annotation onto the input image.
node {
calculator: "PoseRendererGpu"
input_stream: "IMAGE:throttled_input_video"
input_stream: "LANDMARKS:pose_landmarks"
input_stream: "ROI:roi_from_landmarks"
input_stream: "DETECTION:pose_detection"
output_stream: "IMAGE:output_video"
}
# Subgraph that detects actions from poses
node {
calculator: "ActionDetectorGPU"
input_stream: "LANDMARKS:pose_landmarks"
output_stream: "ACTION:action_detection"
}
Update
There is a open source project called SigNN, that does the same thing as I'm intending just for hand pose classification (into american sign language letters). I'm going to plow through that...

Here is a more general formulation of a similar problem. There is a solution using MediaPipeUnityPlugin (but the same graph would also work in pure mediapipe, though there is no released driver code at the time of writing this)

Related

TFX Evaluator seems it is not recognizing the baseline model output from the ResolverNode

I want to use the validation capabilities (model diff or model comparison) of the model Evaluator component in TFX, so I used the base code of taxi template in TFX to do so.
The problem is that when the Evaluator component runs in Kubeflow on GCP throws the next error message within the logs:
ERROR:absl:There are change thresholds, but the baseline is missing. This is allowed only when rubber stamping (first run).
WARNING:absl:"maybe_add_baseline" and "maybe_remove_baseline" are deprecated,
please use "has_baseline" instead.
INFO:absl:Request was made to ignore the baseline ModelSpec and any change thresholds. This is likely because a baseline model was not provided: updated_config=
model_specs {
name: "candidate"
signature_name: "my_model_validation_signature"
label_key: "n_trips"
}
slicing_specs {
}
metrics_specs {
metrics {
class_name: "MeanSquaredError"
threshold {
value_threshold {
upper_bound {
value: 10000.0
}
}
}
}
}
INFO:absl:ModelSpec name "candidate" is being ignored and replaced by "" because a single ModelSpec is being used
Looking at the source code from the executor of the Evaluator component in TFX repo 1 line 138:
has_baseline = bool(input_dict.get(BASELINE_MODEL_KEY))
and then enters a function on line 141:
eval_config = tfma.update_eval_config_with_defaults(eval_config, has_baseline=has_baseline)
Then throws the cited error message only posible if the next condition is meet, from TFX repo 2
if (not has_baseline and has_change_threshold(eval_config) and
not rubber_stamp):
# TODO(b/173657964): Raise an error instead of logging an error.
logging.error('There are change thresholds, but the baseline is missing. '
'This is allowed only when rubber stamping (first run).')
And in fact thats the error that I get on the logs, the model is evaluated but not compared to a baseline even if I provide it the way indicated by sample code in for example:
eval_config = tfma.EvalConfig(
model_specs=[tfma.ModelSpec(name=tfma.CANDIDATE_KEY,label_key='n_trips',
signature_name='my_model_validation_signature'),
tfma.ModelSpec(name=tfma.BASELINE_KEY, label_key='n_trips',
signature_name='my_model_validation_signature', is_baseline=True)
],
slicing_specs=[tfma.SlicingSpec()],
metrics_specs=[
tfma.MetricsSpec(metrics=[
tfma.MetricConfig(
class_name="MeanSquaredError",#'mean_absolute_error',
threshold=tfma.MetricThreshold(
value_threshold=tfma.GenericValueThreshold(
upper_bound={'value': 10000}),
change_threshold=tfma.GenericChangeThreshold(
direction=tfma.MetricDirection.LOWER_IS_BETTER,
relative={'value':1})
)
)
])
])
evaluator = Evaluator(
examples=example_gen.outputs['examples'],
model=trainer.outputs['model'],
baseline_model=model_resolver.outputs['model'],
# Change threshold will be ignored if there is no baseline (first run).
eval_config=eval_config)
# TODO(step 6): Uncomment here to add Evaluator to the pipeline.
components.append(evaluator)
And it continues...
It was solved by upgrading to version 0.27.0 from version 0.26.0
The problem arise because the defaul notebook in Kubeflow Pipelines in Google Cloud Platform installs the 0.26.0 version...

How can I converge loss to a lower value?(tensorflow)

I used tensorflow object detection API.
Here is my environment.
All images are from coco API
Tensorflow version : 1.13.1
Tensorboard version : 1.13.1
Number of test images : 3000
Number of train images : 24000
Pre-trained model : SSD mobilenet v2 quantized 300x300 coco
Number of detecting class : 1(person)
And here is my train_config.
train_config: {
batch_size: 6
optimizer {
adam_optimizer: {
learning_rate {
exponential_decay_learning_rate: {
initial_learning_rate:0.000035
decay_steps: 7
decay_factor: 0.98
}
}
}
}
fine_tune_checkpoint: "D:/TF/models/research/object_detection/ssd_mobilenet_v2_quantized_300x300_coco_2019_01_03/model.ckpt"
fine_tune_checkpoint_type: "detection"
num_steps: 200000
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
}
I can't find optimized learning rate, appropriate decay steps and factor.
So I did many training, but the result is always similar.
How can I fix this??
I already spent a week just for this problem..
On the other post, someone recommended that add a noise to data set(images).
But I don't know what it means.
How can I make that happen?
I think what was referenced on the other post was to do some data augmentation by adding some noisy images to your training dataset. It means that you apply some random transformations to your input so that the model aims to generalize better.
A type of noise that can be used is the Random Gaussian noise (https://en.wikipedia.org/wiki/Gaussian_noise) which is applied by patch in the object-detection API.
Although it seems that you have enough training images it is worth a shot.
The noise would look like :
...
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
data_augmentation_options {
randompatchgaussian {
// The patch size will be chosen to be in the range
// [min_patch_size, max_patch_size).
min_patch_size: 300;
max_patch_size: 300; //if you want the whole image to be noisy
}
}
...
For the list of data augmentation you can check :
https://github.com/tensorflow/models/blob/master/research/object_detection/protos/preprocessor.proto
Regarding the learning rate one common strategy is to try on large learning rate (0.02 for instance) and one very small as you have tried already. I would recommend you to try with 0.02, leave it for a while or use the exponential decay learning rate to see if the results are better.
Changing the batch_size can also have some benefits, try batch_size = 2 instead of 6.
I would also recommend you to leave the training for more steps until you see no improvements at all in the training, maybe leave it until the 200000 steps define in your configuration.
Some deeper strategies can help the model to perform better, they have been said on this answer : https://stackoverflow.com/a/61699696/14203615
That being said, if your dataset is correctly made you should get good results on your test set.

Memory leak caused by incomplete executeAsync() function execution

As the title suggests, I was inferring using executeAsync() for an object detection task. The function works well when I have a detection continuously in my live video and there's no memory leak. However, if I have no detection in one particular frame, there will be an error while it executes the executeAsync() function.
TypeError: Cannot read property '0' of undefined
And here's the memory leak that happened when there's no detection.
{unreliable: false, numBytesInGPU: 422861990, numTensors: 1111, numDataBuffers: 673, numBytes: 320663096}
{unreliable: false, numBytesInGPU: 490830799, numTensors: 1263, numDataBuffers: 752, numBytes: 371606568}
{unreliable: false, numBytesInGPU: 558799608, numTensors: 1415, numDataBuffers: 831, numBytes: 422550040}
I am thinking the issue lies in the dangling tensors that are created in executeAsync() since if there's a detection I would dispose the tensors, but if there's no detection, I couldn't get a reference of the dangling tensors since the test_result variable hasn't been created yet due to the aforementioned error.
I wanted to use tf.tidy() but I did some research on it and apparently it can't be used with asynchronous functions.
How can I eradicate the memory leak? Or is there a problem with my model inference? Really appreciate any help.
async detect(video){
console.log("detecting...");
var example = tf.browser.fromPixels(video);
var tf4d = example.expandDims(0);
try{
// error if there's no detection
const test_result = await this.model.executeAsync({ image_tensor: tf4d }, ['detection_boxes', 'num_detections', 'detection_classes', 'detection_scores']) as tf.Tensor[];
this.no_detection = false;
var detection_boxes = test_result[0].dataSync();
var num_detections = test_result[1].dataSync();
var detection_classes = test_result[2].dataSync();
var detection_scores = test_result[3].dataSync();
// dispose tensors to avoid memory leak
await Promise.all(test_result.map(t => t.data()));
test_result.map(t => t.dispose());
tf.dispose(test_result);
tf4d.dispose();
example.dispose();
console.log(tf.memory());
}
catch(error){
console.log("No detection");
console.log(error)
this.no_detection = true;
// dispose of tensor created from images if theres no detection
tf4d.dispose();
example.dispose();
console.log(tf.memory());
}
}
Edited: I think it's also worth mentioning that I retrained my own model using the Tensorflow Object Detection API following this tutorial.
I think there might also be some problem with the exporting of inference graph using export_inference_graph.pb since I also got this
23 ops no flops stats due to incomplete shapes.
while converting.
I tried converting from a pretrained inference graph to a tfjs model and it works fine - if there's no detection executeAsync() would just return an empty array.
That's the reason why I think maybe there's a problem with the exporting of inference graph of my own model..?
Here's the command (I tried both with and without the --input_shape argument):
python export_inference_graph.py --input_type image_tensor --input_shape 1,300,300,3 --pipeline_config_path training/ssd_mobilenet_v2.config --trained_checkpoint_prefix training/model.ckpt-38153 --output_directory trained-inference-graphs/output_inference_graph

Change loss function to always contain whole object in tensorflow object-detection API

I am working on number detector and use object-detection API from tensorflow. Sometimes the predicted bounding box does not contain whole number, which cannot be read then. I would like to change the loss function to penalize much more when part of a number is missing then when the predicted bounding box is too large.
I found definition of IOU in the file utils/np_box_ops.py, but it is probably not used during training. Where can I find implemantation of loss function used during training?
First of all, have in mind that there might be a problem with your dataset and/or the model/config you are using that we can't be aware because you didn't share any information about those things.
With that said, the avaliable loss functions are defined in:
https://github.com/tensorflow/models/blob/master/research/object_detection/core/losses.py
With corresponding .proto definitions for your config file in:
https://github.com/tensorflow/models/blob/master/research/object_detection/protos/losses.proto
You might be interested in trying WeightedIOULocalizationLoss.
You can also try to adjust the parameter localization_weight in the loss section of your config file:
loss {
classification_loss {
weighted_sigmoid {
}
}
localization_loss {
weighted_smooth_l1 {
}
}
hard_example_miner {
num_hard_examples: 3000
iou_threshold: 0.99
loss_type: CLASSIFICATION
max_negatives_per_positive: 3
min_negatives_per_image: 0
}
classification_weight: 1.0
localization_weight: 1.0
}
And, as a little hack, you could try to post-process the predicted boxes by adding a little offset.

Encog binary classification score for ROC

I am working on a binary classifier using Encog (via Java). I have it set up using an SVM or neural network, and I am want to evaluate the quality of the different models using (in part) the area under the ROC curve.
More specifically, I would ideally like to convert the output of the model into a some kind of prediction confidence score that can be used for rank ordering in the ROC, but I have yet to find anything in the documentation.
In the code, I get the model results with something like:
MLData result = ((MLRegression) method).compute( pair.getInput() );
String classification = normHelper.denormalizeOutputVectorToString( result )[0];
How do I also get a numerical confidence of the classification?
I have found a way to coax prediction probabilities out of SVM inside the encog framework. This method relies upon the equivalent of the -b option for libSVM (see http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html)
To do this, override the SVM class from encog. The constructor will enable probability estimates via the smv_parameter object (see below). Then, when doing the calculation, call the method svm_predict_probability as shown below.
Caveat: below is only a code fragment and in order to be useful you will probably need to write other constructors and to pass the resulting probabilities out of the methods below. This fragment is based upon encog version 3.3.0.
public class MySVMProbability extends SVM {
public MySVMProbability(SVM method) {
super(method.getInputCount(), method.getSVMType(), method.getKernelType());
// Enable probability estimates
getParams().probability = 1;
}
#Override
public int classify(final MLData input) {
svm_model model = getModel();
if (model == null) {
throw new EncogError(
"Can't use the SVM yet, it has not been trained, "
+ "and no model exists.");
}
final svm_node[] formattedInput = makeSparse(input);
final double probs[] = new double[svm.svm_get_nr_class(getModel())];
final double d = svm.svm_predict_probability(model, formattedInput, probs);
/* probabilities for each class are in probs[] */
return (int) d;
}
#Override
public MLData compute(MLData input) {
svm_model model = getModel();
if (model == null) {
throw new EncogError(
"Can't use the SVM yet, it has not been trained, "
+ "and no model exists.");
}
final MLData result = new BasicMLData(1);
final svm_node[] formattedInput = makeSparse(input);
final double probs[] = new double[svm.svm_get_nr_class(getModel())];
final double d = svm.svm_predict_probability(model, formattedInput, probs);
/* probabilities for each class are in probs[] */
result.setData(0, d);
return result;
}
}
Encog has no direct support for ROC curves. A ROC curve is more of a visualization than an actual model type, which is primarily the focus of Encog.
Generating a ROC curve for SVM's and Neural Networks is somewhat different. For a neural network, you must establish thresholds for the classification neurons. There is a good paper about that here: http://www.lcc.uma.es/~jja/recidiva/048.pdf
I may eventually add direct support for ROC curves into Encog in the future. They are becoming a very common visualization.