While serving TensorFlow models via TensorFlow-Serving, I need to expose custom meta data to the clients (e.g. a model's input data requirements, training information...).
I tried adding the information via tf.add_to_collection( <my_custom_key>, <value> ) before saving the model, and, sure enough, the information showed up in the .pb(txt) file used by the server.
However, currently it looks as if the response to querying metadata (e.g. via GET http://localhost:8501/v1/models/<my_model>/metadata) only returns the contents of the signature_def section (which also cannot be extended, the validator prevents that), and I know of no way to query contents of other sections.
Is there a way to serve/query custom meta data for TF Serving?
Unfortunately adding logic to allow serving metadata other than signaturedefs is not on the roadmap right now and I'm not sure we have a good understanding of the general use case for which supporting this would make sense.
Regarding how to serve the metadata stored in the saved model, presumably, you'd add a constant to your graph holding the tensor value of interest (the input/output shape), create a new signature using link below and do inference with that signature -- I've never seen this done but I can't imagine why it wouldn't work.
https://www.tensorflow.org/guide/saved_model#manually_build_a_savedmodel
While I still did not find a solution for TensorFlow Serving, it may be of interest to other readers that it can be achieved when using NVidia's Triton Inference Server.
While evaluating it as a TFS alternative (mainly for its built-in support for other model formats such as pytorch and ONNX), I found out that in Triton, it is possible to serve custom meta information via the Model Configuration Extension using the 'parameters' property. After adding
parameters: {
key: "inference_properties"
value: {
string_value: "<my-custom-inference-property-info>"
}
}
to the model's config.pbtxt file, I could retrieve the information on the client side. It's not extremely convenient, as one can only provide a flat map with string values, but still.
Related
We are experimenting to have our own MLIR stack to import TFL models and compile them for a specific accelerator. We are also building our own runtime/simulator to run these imported models. Our current way of working is that we freeze the TF.keras model, convert to TFL and then use flatbuffer_translate to get MLIR tfl dialect.
Towards this goal, however, I need to pass some attributes with some operations special to our target architecture. I initially wanted to pass these attributes with an operation such as conv2d. However, I don't know a way (if at all possible) to extend such operations that are natively defined / supported by tfl.
I then tried to define and register a custom TF operation with its custom attributes. The semantics of the operation would be an identity function but I just intended to use it as a placeholder to pass my attributes. Once I tried this, I saw that the resulting TFL MLIR contains my custom op, however, the attributes are encoded into an opaque type with a byte stream as its value.
I could not find much documentation related to how I can decode these attributes. I'd appreciate any tips on decoding or any other suggestion to help achieving our goal.
Thanks!
How are you encoding it in the input? I'm guessing you are seeing it encoded as an AttrValue (https://github.com/tensorflow/tensorflow/blob/b1e813e2ec9634ec0e6562b836e372e393f3de43/tensorflow/core/framework/attr_value.proto#L18) and so you'd decode it as you would a protobuf normally.
I am fairly new to TensorFlow (and SageMaker) and am stuck in the process of deploying a SageMaker endpoint. I have just recently succeeded in creating a Saved Model type model, which is currently being used to service a sample endpoint (the model was created externally). However, when I checked the image I am using for the endpoint, it says '.../tensorflow-inference', which is not the direction I want to go in because I want to use a SageMaker TensorFlow serving container (I followed tutorials from the official TensorFlow serving GitHub repo-using sample models, and they are deployed correcting using the TensorFlow serving framework).
Am I encountering this issue because my Saved Model does not have the correct 'serving' tag? I have not checked my tag sets yet but wanted to know if this would be the core reason to the problem. Also, most importantly, what are the differences between the two container types-I think having a better understanding of these two concepts would show me why I am unable to produce the correct image.
This is how I deployed the sample endpoint:
model = Model(model_data =...)
predictor = model.deploy(initial_instance_count=...)
When I run the code, I get a model, an endpoint configuration, and an endpoint. I got the container type by clicking on model details within the AWS SageMaker console.
There are two APIs for deploying TensorFlow models: tensorflow.Model and tensorflow.serving.Model. It isn't clear from the code-snippet which one you're using, but the SageMaker docs recommend the latter for deploying from pre-existing s3 artifacts:
from sagemaker.tensorflow.serving import Model
model = Model(model_data='s3://mybucket/model.tar.gz', role='MySageMakerRole')
predictor = model.deploy(initial_instance_count=1, instance_type='ml.c5.xlarge')
Reference: https://github.com/aws/sagemaker-python-sdk/blob/c919e4dee3a00243f0b736af93fb156d17b04796/src/sagemaker/tensorflow/deploying_tensorflow_serving.rst#deploying-directly-from-model-artifacts
it says '.../tensorflow-inference', which is not the direction I want to go in because I want to use a SageMaker TensorFlow serving container
If you haven't specified an image argument for tensorflow.Model, SageMaker should be using the default TensorFlow serving image (seems like "../tensorflow-inference").
image (str) – A Docker image URI (default: None). If not specified, a default image for TensorFlow Serving will be used.
If all of this seems needlessly complex to you, I'm working on a platform that makes this set up a single line of code -- I'd love for you to try it, dm me at https://twitter.com/yoavz_.
There are different versions for the framework containers. Since the framework version I'm using is 1.15, the image I got had to be in a tensorflow-inference container. If I used versions <= 1.13, then I would get sagemaker-tensorflow-serving images. The two aren't the same, but there's no 'correct' container type.
I know how to load a pre-trained image models from Tensorflow Hub. like so:
#load model
image_module = hub.Module('https://tfhub.dev/google/imagenet/mobilenet_v2_035_128/feature_vector/2')
#get predictions
features = image_module(batch_images)
I also know how to customize the output of this model (fine-tune on new dataset). The existing Modules expect input batch_images to be a RGB image tensor.
My question: Instead of the input being a RGB image of certain dimensions, I would like to use a tensor (dim 20x20x128, from a different model) as input to the Hub model. This means I need to by-passing the initial layers of the tf-hub model definition (i don't need them). Is this possible in tf-hub module api's? Documentation is not clear on this aspect.
p.s.: I can do this easily be defining my own layers but trying to see if i can use the Tf-Hub API's.
The existing https://tfhub.dev/google/imagenet/... modules do not support this.
Generally speaking, the hub.Module format allows multiple signatures (that is, combinations of input/output tensors; think feeds and fetches as in tf.Session.run()). So module publishers can arrange for that if there is a common usage pattern they want to support.
But for free-form experimentation at this level of sophistication, you are probably better off directly using and tweaking the code that defines the models, such as TF Slim (for TF1.x) or Keras Applications (also for TF2). Both provide Imagenet-pretrained checkpoints for downloading and restoring on the side.
Is it possible to use different feature extractor with SSD meta-architecture in Tensorflow's Object Detection API? I know that .config files for mobilenets and inception are provided but is it possible to use a different architecture like AlexNet or VGG?
It's possible but with a little bit of work, as explained here, you should read this page for detailed explanation and links to examples.
In short, you'll need to create a custom FasterRCNNFeatureExtractor class, corresponding to VGG or AlexNet (it may require a bit of knowledge about these, for instance the amount of subsampling invovled). In this class, you'll code how your data should be preprocessed, how to retrieve the 1st and 2nd stage features in it (typically how is the last convolutional layer called), and how to load it.
Then you'll need to register your feaure extractor (tell the object detection API that it exists) by modifying the file object_detection/builders/model_builder.py.
Finally you should be able to make a config file with your custom feature extractor, et voilà !
I successfully created a server that receives a TF saved_model, but now I want to send it queries and get predictions.
However, I'm having a hard time of understanding how the client works and how to implement it.
All I found online is the basic tutorial, but they only give the client code for mnist, and it doesn't fit my own mdoel.
So can anyone refer me to how to use or implement the client for a different model?
Thanks
I really thank google to make tensorflow serving open source, it is so helpful for people like me to put prediction models into production. But I have to admit tensorflow serving did poorly in documentation, or, they assume people who use it should already have pretty good knowledge in tensorflow. I stuck for a long time in order to get understand how it works. In their website they introduced concepts and examples well, but there is something missing in between.
I will recommend the tutorial here. This is the first part, and you can also follow the second part, the link will be in that article.
In general, when you export your .ckpt files to a servable model(.pb file and variables folder), you have to define the input, output and method name of your model and save them as a signature in tf.saved_model.signature_def_utils.build_signature_def
In the article, you will find what I said above in this part:
tf.saved_model.signature_def_utils.build_signature_def(
inputs={‘images’: predict_tensor_inputs_info},
outputs={‘scores’: predict_tensor_scores_info},
method_name=\
tf.saved_model.signature_constants.PREDICT_METHOD_NAME)
You can follow how the author defined input and output in the article, and do the same thing to your customized model.
After that, you have to in your client script to call the signature and feed input to server, server then will recognize which method to use and return the output. You can check how the author wrote the client script and find corresponding part of calling signature and feeding input.