I'm using AMD Gpu, and the VideoFileWriter doing exactly what I need,
the only thing that I didn't find a way to use is the h264_amf encoder instead of the usual h264 encoder.
Related
I am currently trying to quantize a bert-classifier model but am running into an error, I was wondering if this is even supported at the moment or not? For clarity I am asking if quantization is supported on the BERT Classifier super class in the tensorflow-model-garden? Thanks in advance for the help!
Quantizing the standard BERT classifier is probably not a good way to go, if you are interesting in running a BERT-like model on a resource constrained edge device (like a mobile phone). For your specific question, I believe the answer is 'no, quantization of the standard BERT is not supported.' However, a better answer is probably to use one of the smaller BERT-type models that have been created for the edge use case, such as MobileBERT:
https://github.com/google-research/google-research/tree/master/mobilebert
The above link includes scripts for fine-tuning and then converting to TF Lite format in order to run on device.
These are the instruction to solve the assignments?
Convert your TensorFlow model to UFF
Use TensorRT’s C++ API to parse your model to convert it to a CUDA engine.
TensorRT engine would automatically optimize your model and perform steps
like fusing layers, converting the weights to FP16 (or INT8 if you prefer) and
optimize to run on Tensor Cores, and so on.
Can anyone tell me how to proceed with this assignment because I don't have GPU in my laptop and is it possible to do this in google colab or AWS free account.
And what are the things or packages I have to install for running TensorRT in my laptop or google colab?
so I haven't used .uff but I used .onnx but from what I've seen the process is similar.
According to the documentation, with TensorFlow you can do something like:
from tensorflow.python.compiler.tensorrt import trt_convert as trt
converter = trt.TrtGraphConverter(
input_graph_def=frozen_graph,
nodes_blacklist=['logits', 'classes'])
frozen_graph = converter.convert()
In TensorFlow1.0, so they have it pretty straight forward, TrtGraphConverter has the option to serialized for FP16 like:
converter = trt.TrtGraphConverter(
input_saved_model_dir=input_saved_model_dir,
max_workspace_size_bytes=(11<32),
precision_mode=”FP16”,
maximum_cached_engines=100)
See the preciosion_mode part, once you have serialized you can load the networks easily on TensorRT, some good examples using cpp are here.
Unfortunately, you'll need a nvidia gpu with FP16 support, check this support matrix.
If I'm correct, Google Colab offered a Tesla K80 GPU which does not have FP16 support. I'm not sure about AWS but I'm certain the free tier does not have gpus.
Your cheapest option could be buying a Jetson Nano which is around ~90$, it's a very powerful board and I'm sure you'll use it in the future. Or you could rent some AWS gpu server, but that is a bit expensive and the setup progress is a pain.
Best of luck!
Export and convert your TensorFlow model into .onnx file.
Then, use this onnx-tensorrt tool to do the CUDA engine file conversion.
I notice in the code for the Tensorflow Object Detection API there are several references to Mask R-CNN however no mention of it in the documentation. Is it possible to train/run Mask R-CNN through this API, and if so how?
You may not like it, but the answer is (for the moment), is no. The API cannot be used to predict or recover masks
They only use a little part of the Mask R-CNN paper to predict boxes in a certain way, but predicting the instance masks is not yet implemented.
Now we can implement Mask with faster_rcnn_inception_v2 there is samples with 1.8.0 tensorflow version
I'm trying to perform object detection on a custom, relatively easy dataset (with ~30k samples). I've successfully used Faster_RCNN with Resnet101_v1 (final mAP 0.9) and inception_resnet_v2 feature extractors (training in progress). Now I would like my model to run faster but still keep good performance, so I'd like to compare the ones I have, with SSD running with various versions of mobile_net. However, to know which changes in performance come from SSD and which come from the feature extractor, I'd like to also try Faster-RCNN with mobile_nets. It's also possible that this yields the tradeoff I need between performance and inference time (faster RCNN being good and slow, and mobile_nets fast).
The original MobileNets paper mentions using it with Faster RCNN, and I guess they used the tensorflow model detection API, so maybe they've released the files to adapt MobileNets to Faster RCNN ?
How can I make mobile_nets compatible with Faster-RCNN?
In a nutshell, a MobileNet version of the Faster-RCNN Feature Extractor will need to be created. This is something we are looking at adding, but is not a current priority.
I am not an expert apparently, but as far as know, you cann't use mobilenets with faster_rcnn, mobilenets is based on yolo which is a different architecture from faster_rcnn.
Google released its Object Detection Model recently.
https://github.com/tensorflow/models/tree/master/object_detection
You can replace feature extractor easily with this API (Xception, Inception ResNet, DenseNet, or Mobile Net) with a current object detector.
There are two common parts in many Object Recognition Systems. The first part is feature extractor (extracting features such as edges, lines, colors from image input). The second part is Object Detector (Faster R-CNN, SSD, YOLOv2).
https://www.tensorflow.org/performance/performance_guide#use_nchw_image_data_format
I've read that cuDNN has better performance with NCHW (feature maps on the second axis) but that NHWC is better on CPU (feature maps on last axis).
As of TensorFlow 1.2, I wonder if it's still recommended to manually support both formats, or if it's reasonable to expect tf.train, tf.layers etc. to automatically take care of dimension reordering as needed (I believe they should!). Manually supporting both data formats feels ugly and like a leaky abstraction with implementation details that I as a TensorFlow user should not have to know about, hence I'd like to avoid it.
Also, how much of a performance improvement would one reasonably expect to gain from GPU training with NCHW instead of NHWC?
It would be interesting to know where you found that CPU executions are faster in NHWC mode. Intel MKL library for DNN uses the NCHW format by default, and as I understand, uses yet another opaque, SIMD-friendly format internally. So it you go NCHW, at least you wouldn't have to maintain two versions.
I don't know what order of gain you could expect. As CuDNN uses the NCHW order itself, I suppose tensorflow does not convert formats back and forth at each layer, but converts back into NHWC only when needed (e.g. when you explicitly ask for the tensor values). So unless you do lots of exotic stuff outside of standard CuDNN operations, I would not be surprised if gains are minor. But it is just an uneducated guess.