What is the main difference between .pb format of tensorflow and .h5 format of keras to store models? Is there any reason to choose one over the other?
Different file formats with different characteristics, both used by tensorflow to save models (.h5 specifically by keras).
.pb - protobuf
It is a way to store some structured data (in this case a neural network),project is open source and currently overviewed by Google.
Example
person {
name: "John Doe"
email: "jdoe#example.com"
}
Simple class containing two fields, you can load it in one of multiple supported languages (e.g. C++, Go), parse, modify and send to someone else in binary format.
Advantages
really small and efficient to parse (when compared to say .xml), hence often used for data transfer across the web
used by Tensorflow's Serving when you want to take your model to production (e.g. inference over the web)
language agnostic - binary format can be read by multiple languages (Java, Python, Objective-C, and C++ among others)
advised to use since tf2.0 , you can see official serializing guide
saves various metadata (optimizers, losses etc. if using keras's model)
Disadvantages
SavedModel is conceptually harder to grasp than single file
creates folder where weights are
Sources
You can read about this format here
.h5 - HDF5 binary data format
Used originally by keras to save models (keras is now officially part of tensorflow). It is less general and more "data-oriented", less programmatic than .pb.
Advantages
Used to save giant data (so some neural networks would fit well)
Common file saving format
Everything saved in one file (weights, losses, optimizers used with keras etc.)
Disadvantages
Cannot be used with Tensorflow Serving but you can simply convert it to .pb via keras.experimental.export_saved_model(model, 'path_to_saved_model')
All in all
Use the simpler one (.h5) if you don't need to productionize your model (or it's reasonably far away). Use .pb if you are going for production or just want to standardize on single format across all tensorflow provided tools.
Related
Tensorflow has several types of model formats:
TensorFlow SavedModel 2. Frozen Model 3. Session Bundle 4. Tensorflow Hub module
How can you distinguish between them on-disk? (to later use with tensorflowjs-converter)
And how is each model created?
Yup, there are a LOT of different model types, and they all have good reasons. I'm not going to claim that I have perfect clarity of each, but here's what I know (I think I know).
The .pb file: PB stands for protobuff or Protocol Buffer. This is the model structure, generally without the trained weights and is stored in a binary format.
The .pbtxt file: Nonbinary of the pb file for human reading.
Protobuff files that aren't frozen will need a checkpoint .ckpt file, too. The Checkpoint file is the missing set of weights that the pb needs.
The .h5 file: The model + weights from a Keras save
The .tflite file would be a TensorflowLite model
Frozen Model: A frozen model combines the pb with the weights file, so you don't have to manage two of them. Usually, this means adding the word frozen to the filename. I'm sure this can be inferred when loading the file, but on disk they are a bit more on the honor system and no ckpt file. This strips out extraneous graph info; it's basically like the "Production Ready" version of the model.
Session Bundle: Are a directory. They are nolonger used, and rare.
Tensorflow Hub Module: These are pre-existing popular models that are very likely already exported to TFJS and don't require you to manually convert them. I assume they are supported for Google's benefit, more so than ours. But it's nice to know if you use hub, you can always convert it.
A multi-exported grouping of files looks like this image. From here, you can see quite a few that you could turn into TFJS.
TensorFlow Object Detection API prefers TFRecord file format. MXNet and Amazon Sagemaker seem to use RecordIO format. How are these two binary file formats different, or are they the same thing?
RecordIO and TFRecord are same in the sense that they are serving the same purpose - to put data in one sequence for faster reading, and both of them using Protocol buffers under the hood for better space allocation.
It seems to me that RecordIO is more like an umbrella term: a format that is used to store huge chunk of data in one file for faster reading. Some products adopt "RecordIO" as an actual term, but in Tensorflow they decided to use a specific word for that - TFRecord. That's why some people call TFRecord as "TensorFlow-flavored RecordIO format".
There is no single RecordIO format as is. People from Apache Mesos, who also call their format RecordIO, say: "Since there is no formal specification of the RecordIO format, there tend to be slight incompatibilities between RecordIO implementations". And their RecordIO format is different from the one MXNet uses - I don't see "magic number" at the beginning of each record.
So, on structure level TFRecord of Tensorflow and RecordIO of MXNet are different file formats, e.g. you don't expect MXNet to be able to read TFRecord and vice versa. But on a logical level - they serve same purpose and can be considered similar.
I would like to convert a TRT optimized frozen model to saved model for tensorflow serving. Are there any suggestions or sources to share?
Or are there any other ways to deploy a TRT optimized model in tensorflow serving?
Thanks.
Assuming you have a TRT optimized model (i.e., the model is represented already in UFF) you can simply follow the steps outlined here: https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#python_topics. Pay special attention to section 3.3 and 3.4, since in these sections you actually build the TRT engine and then save it to a file for later use. From that point forward, you can just re-use the serialized engine (aka. a PLAN file) to do inference.
Basically, the workflow looks something like this:
Build/train model in TensorFlow.
Freeze model (you get a protobuf representation).
Convert model to UFF so TensorRT can understand it.
Use the UFF representation to build a TensorRT engine.
Serialize the engine and save it to a PLAN file.
Once those steps are done (and you should have sufficient example code in the link I provided) you can just load the PLAN file and re-use it over and over again for inference operations.
If you are still stuck, there is an excellent example that is installed by default here: /usr/src/tensorrt/samples/python/end_to_end_tensorflow_mnist. You should be able to use that example to see how to get to the UFF format. Then you can just combine that with the example code found in the link I provided.
I haven't tried Tensorflow yet but still curious, how does it store, and in what form, data type, file type, the acquired learning of a machine learning code for later use?
For example, Tensorflow was used to sort cucumbers in Japan. The computer used took a long time to learn from the example images given about what good cucumbers look like. In what form the learning was saved for future use?
Because I think it would be inefficient if the program should have to re-learn the images again everytime it needs to sort cucumbers.
Ultimately, a high level way to think about a machine learning model is three components - the code for the model, the data for that model, and metadata needed to make this model run.
In Tensorflow, the code for this model is written in Python, and is saved in what is known as a GraphDef. This uses a serialization format created at Google called Protobuf. Common serialization formats include Python's native Pickle for other libraries.
The main reason you write this code is to "learn" from some training data - which is ultimately a large set of matrices, full of numbers. These are the "weights" of the model - and this too is stored using ProtoBuf, although other formats like HDF5 exist.
Tensorflow also stores Metadata associated with this model - for instance, what should the input look like (eg: an image? some text?), and the output (eg: a class of image aka - cucumber1, or 2? with scores, or without?). This too is stored in Protobuf.
During prediction time, your code loads up the graph, the weights and the meta - and takes some input data to give out an output. More information here.
Are you talking about the symbolic math library, or the idea of tensor flow in general? Please be more specific here.
Here are some resources that discuss the library and tensor flow
These are some tutorials
And here is some background on the field
And this is the github page
If you want a more specific answer, please give more details as to what sort of work you are interested in.
Edit: So I'm presuming your question is more related to the general field of tensor flow than any particular application. Your question still is too vague for this website, but I'll try to point you toward a few resources you might find interesting.
The tensorflow used in image recognition often uses an ANN (Artificial Neural Network) as the object on which to act. What this means is that the tensorflow library helps in the number crunching for the neural network, which I'm sure you can read all about with a quick google search.
The point is that tensorflow isn't a form of machine learning itself, it more serves as a useful number crunching library, similar to something like numpy in python, in large scale deep learning simulations. You should read more here.
I am a CNTK user. I use AlexNet but would like a more compact NN -- so SqueezeNet seems to be of interest. Or does someone have some other suggestion? How do CNTK users deploy when size matters? Does somebody have a CNTK implementation of SqueezeNet?
The new ONNX model format now has several pretrained vision models, including one for SqueezeNet. You can download the model and load it into CNTK:
import cntk as C
z = C.Function.load(<path of your ONNX model>, format=C.ModelFormat.ONNX)
You can find tutorials for importing/exporting ONNX models in CNTK here.
SqueezeNet is a good choice for a small network with the possibility of a good accuracy. Take a look at DSDSqueezeNet for even better accuracy.
However, if it does not need to be as small as SqueezeNet you also could take a look at MobileNet or NasNet Mobile. These networks may be bigger, but they provide state of the art performance in the task of image classification.
Unfortunately, I do not have an CNTK implementation of SqueezeNet, but maybe a pretrained CNTK model, which you can reuse and finetune using Transfer Learning is all what you are looking for. In this case I can recommend you MMdnn, a conversion tool, which allows to convert a existing pretrained Caffe network to the CNTK model format. In this issue you can find a step by step guide for SqueezeNet.
I would not know of a method for especially small deployment, but you have basically two choices when it comes to saving your model: The standard CNTK model format and the new ONNX format which CNTK is going to support or does already. Till now, I could not try it myself, but maybe it offers a smaller size for the same network.
Since the CNTK model format is already saving the model in binary, I would not expect high improvements anyway, for any format. Anyway, compressing the model definitively could be an option, if size is very important.