Get info of exposed models in Tensorflow Serving - tensorflow

Once I have a TF server serving multiple models, is there a way to query such server to know which models are served?
Would it be possible then to have information about each of such models, like name, interface and, even more important, what versions of a model are present on the server and could potentially be served?

It is really hard to find some info about this, but there is possibility to get some model metadata.
request = get_model_metadata_pb2.GetModelMetadataRequest()
request.model_spec.name = 'your_model_name'
request.metadata_field.append("signature_def")
response = stub.GetModelMetadata(request, 10)
print(response.model_spec.version.value)
print(response.metadata['signature_def'])
Hope it helps.
Update
Is is possible get these information from REST API. Just get
http://{serving_url}:8501/v1/models/{your_model_name}/metadata
Result is json, where you can easily find model specification and signature definition.

It is possible to get model status as well as model metadata. In the other answer only metadata is requested and the response, response.metadata['signature_def'] still needs to be decoded.
I found the solution is to use the built-in protobuf method MessageToJson() to convert to json string. This can then be converted to a python dictionary with json.loads()
import grpc
import json
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc
from tensorflow_serving.apis import model_service_pb2_grpc
from tensorflow_serving.apis import get_model_status_pb2
from tensorflow_serving.apis import get_model_metadata_pb2
from google.protobuf.json_format import MessageToJson
PORT = 8500
model = "your_model_name"
channel = grpc.insecure_channel('localhost:{}'.format(PORT))
request = get_model_status_pb2.GetModelStatusRequest()
request.model_spec.name = model
result = stub.GetModelStatus(request, 5) # 5 secs timeout
print("Model status:")
print(result)
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
request = get_model_metadata_pb2.GetModelMetadataRequest()
request.model_spec.name = model
request.metadata_field.append("signature_def")
result = stub.GetModelMetadata(request, 5) # 5 secs timeout
result = json.loads(MessageToJson(result))
print("Model metadata:")
print(result)

To continue the decoding process, either follow Tyler's approach and convert the message to JSON, or more natively Unpack into a SignatureDefMap and take it from there
signature_def_map = get_model_metadata_pb2.SignatureDefMap()
response.metadata['signature_def'].Unpack(signature_def_map)
print(signature_def_map.signature_def.keys())

To request data using REST API, for additional data of the particular model that is served, you can issue (via curl, Postman, etc.):
GET http://host:port/v1/models/${MODEL_NAME}
GET http://host:port/v1/models/${MODEL_NAME}/metadata
For more information, please check https://www.tensorflow.org/tfx/serving/api_rest

Related

Chatbot using Huggingface Transformers

I would like to use Huggingface Transformers to implement a chatbot. Currently, I have the code shown below. The transformer model already takes into account the history of past user input.
Is there something else (additional code) I have to take into account for building the chatbot?
Second, how can I modify my code to run with TensorFlow instead of PyTorch?
Later on, I also plan to fine-tune the model on other data. I also plan to test different models such as BlenderBot and GPT2. I think to test this different models it should be as easy as replacing the corresponding model in AutoTokenizer.from_pretrained("microsoft/DialoGPT-small") and AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-small")
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-small")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-small")
for step in range(5):
# encode the new user input, add the eos_token and return a tensor in Pytorch
new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt')
# append the new user input tokens to the chat history
bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids
# generated a response while limiting the total chat history to 1000 tokens,
chat_history_ids = model.generate(bot_input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id)
# pretty print last ouput tokens from bot
print("DialoGPT: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))
Here is an example of using the DialoGPT model with Tensorflow:
from transformers import TFAutoModelForCausalLM, AutoTokenizer, BlenderbotTokenizer, TFBlenderbotForConditionalGeneration
import tensorflow as tf
chat_bots = {
'BlenderBot': [BlenderbotTokenizer.from_pretrained('facebook/blenderbot-400M-distill'), TFT5ForConditionalGeneration.from_pretrained('facebook/blenderbot-400M-distill')],
'DialoGPT': [AutoTokenizer.from_pretrained("microsoft/DialoGPT-small"), TFAutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-small")],
}
key = 'DialoGPT'
tokenizer, model = chat_bots[key]
for step in range(5):
new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='tf')
if step > 0:
bot_input_ids = tf.concat([chat_history_ids, new_user_input_ids], axis=-1)
else:
bot_input_ids = new_user_input_ids
chat_history_ids = model.generate(bot_input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id)
print(key + ": {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))
>> User:How are you?
DialoGPT: I'm here
>> User:Why are you here
DialoGPT: I'm here
>> User:But why
DialoGPT: I'm here
>> User:Where is here
DialoGPT: Where is where?
>> User:Here
DialoGPT: Where is here?
If you want to compare different chatbots, you might want to adapt their decoder parameters, because they are not always identical. For example, using BlenderBot and a max_length of 50 you get this kind of response with the current code:
>> User:How are you?
BlenderBot: ! I am am great! how how how are are are???
In general, you should ask yourself which special characters are important for a chatbot (depending on your domain) and which characters should / can be omitted?
You should also experiment with different decoding methods such as greedy search, beam search, random sampling, top-k sampling, and nucleus sampling and find out what works best for your use case. For more information on this topic check out this post

Send a pickled dataframe as a file to FastAPI route

I would like to post a pickled dataframe file into a FastAPI route. However, I keep getting an error. Could anyone suggest how to fix my scripts?
This is interesting for me since I would like to preserve the dataframe column types & details. Therefore, posting the dataframe directly to a route is not an option as those details & types would be lost after JSONification.
main.py
from fastapi import FastAPI, File
app = FastAPI(debug=True)
#app.post("/luminex_file/")
async def handle_luminex_data(file: bytes = File(...)):
df = pd.read_pickle(io.BytesIO(file))
return {"file_size": len(file), "df": df}
request.py
files = {'file': open('dataframe.pickle', 'rb')}
response = requests.post("http://127.0.0.1:8000/luminex_file/", files=files)

How to feed an audio file from S3 bucket directly to Google speech-to-text

We are developing a speech application using Google's speech-to-text API. Now our data (audio files) get stored in S3 bucket on AWS. is there a way to directly pass the S3 URI to Google's speech-to-text API?
From their documentation it seems this is at the moment not possible in Google's speech-to-text API
This is not the case for their vision and NLP APIs.
Any ideas why this limitation for speech APIs?
And whats a good work around for this?
Currently, Google only allows audio files from either your local source or from Google's Cloud Storage. No reasonable explanation is given on the documentation about this.
Passing audio referenced by a URI
More typically, you will pass a uri parameter within the Speech request's audio field, pointing to an audio file (in binary format, not base64) located on Google Cloud Storage
I suggest you move your files to Google Cloud Storage. If you don't want to, there is a good workaround:
Use Google Cloud Speech API with streaming API. You are not required to store anything anywhere. Your speech application provides input from any microphone. And don't worry if you don't know handling inputs from microphone.
Google provides a sample code that does it all:
# [START speech_transcribe_streaming_mic]
from __future__ import division
import re
import sys
from google.cloud import speech
import pyaudio
from six.moves import queue
# Audio recording parameters
RATE = 16000
CHUNK = int(RATE / 10) # 100ms
class MicrophoneStream(object):
"""Opens a recording stream as a generator yielding the audio chunks."""
def __init__(self, rate, chunk):
self._rate = rate
self._chunk = chunk
# Create a thread-safe buffer of audio data
self._buff = queue.Queue()
self.closed = True
def __enter__(self):
self._audio_interface = pyaudio.PyAudio()
self._audio_stream = self._audio_interface.open(
format=pyaudio.paInt16,
channels=1,
rate=self._rate,
input=True,
frames_per_buffer=self._chunk,
# Run the audio stream asynchronously to fill the buffer object.
# This is necessary so that the input device's buffer doesn't
# overflow while the calling thread makes network requests, etc.
stream_callback=self._fill_buffer,
)
self.closed = False
return self
def __exit__(self, type, value, traceback):
self._audio_stream.stop_stream()
self._audio_stream.close()
self.closed = True
# Signal the generator to terminate so that the client's
# streaming_recognize method will not block the process termination.
self._buff.put(None)
self._audio_interface.terminate()
def _fill_buffer(self, in_data, frame_count, time_info, status_flags):
"""Continuously collect data from the audio stream, into the buffer."""
self._buff.put(in_data)
return None, pyaudio.paContinue
def generator(self):
while not self.closed:
# Use a blocking get() to ensure there's at least one chunk of
# data, and stop iteration if the chunk is None, indicating the
# end of the audio stream.
chunk = self._buff.get()
if chunk is None:
return
data = [chunk]
# Now consume whatever other data's still buffered.
while True:
try:
chunk = self._buff.get(block=False)
if chunk is None:
return
data.append(chunk)
except queue.Empty:
break
yield b"".join(data)
def listen_print_loop(responses):
"""Iterates through server responses and prints them.
The responses passed is a generator that will block until a response
is provided by the server.
Each response may contain multiple results, and each result may contain
multiple alternatives; for details, see the documentation. Here we
print only the transcription for the top alternative of the top result.
In this case, responses are provided for interim results as well. If the
response is an interim one, print a line feed at the end of it, to allow
the next result to overwrite it, until the response is a final one. For the
final one, print a newline to preserve the finalized transcription.
"""
num_chars_printed = 0
for response in responses:
if not response.results:
continue
# The `results` list is consecutive. For streaming, we only care about
# the first result being considered, since once it's `is_final`, it
# moves on to considering the next utterance.
result = response.results[0]
if not result.alternatives:
continue
# Display the transcription of the top alternative.
transcript = result.alternatives[0].transcript
# Display interim results, but with a carriage return at the end of the
# line, so subsequent lines will overwrite them.
#
# If the previous result was longer than this one, we need to print
# some extra spaces to overwrite the previous result
overwrite_chars = " " * (num_chars_printed - len(transcript))
if not result.is_final:
sys.stdout.write(transcript + overwrite_chars + "\r")
sys.stdout.flush()
num_chars_printed = len(transcript)
else:
print(transcript + overwrite_chars)
# Exit recognition if any of the transcribed phrases could be
# one of our keywords.
if re.search(r"\b(exit|quit)\b", transcript, re.I):
print("Exiting..")
break
num_chars_printed = 0
def main():
language_code = "en-US" # a BCP-47 language tag
client = speech.SpeechClient()
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=RATE,
language_code=language_code,
)
streaming_config = speech.StreamingRecognitionConfig(
config=config, interim_results=True
)
with MicrophoneStream(RATE, CHUNK) as stream:
audio_generator = stream.generator()
requests = (
speech.StreamingRecognizeRequest(audio_content=content)
for content in audio_generator
)
responses = client.streaming_recognize(streaming_config, requests)
# Now, put the transcription responses to use.
listen_print_loop(responses)
if __name__ == "__main__":
main()
# [END speech_transcribe_streaming_mic]
Dependencies are google-cloud-speech and pyaudio
For AWS S3, you can store your audio files there before/after you get the transcripts from Google Speech API.
Streaming is super fast as well.
And don't forget to include your credentials. You need to get authorized first by providing GOOGLE_APPLICATION_CREDENTIALS

If you use Tensorflow Dataset do you have to upload your data?

I have been looking at Tensorflow Dataset (TFDS) and seems really useful. The only issue I can see is that if you want to use it, you have to upload your dataset publicly to TFDS. Is that correct?
Is there anyway of using TFDS on a private server? Only for internal use?
You can easily create new local datasets, as described in the documentation. Here's an excerpt:
class MyDataset(tfds.core.GeneratorBasedBuilder):
"""DatasetBuilder for my_dataset dataset."""
VERSION = tfds.core.Version('1.0.0')
RELEASE_NOTES = {
'1.0.0': 'Initial release.',
}
def _info(self) -> tfds.core.DatasetInfo:
"""Dataset metadata (homepage, citation,...)."""
return tfds.core.DatasetInfo(
builder=self,
features=tfds.features.FeaturesDict({
'image': tfds.features.Image(shape=(256, 256, 3)),
'label': tfds.features.ClassLabel(names=['no', 'yes']),
}),
)
def _split_generators(self, dl_manager: tfds.download.DownloadManager):
"""Download the data and define splits."""
extracted_path = dl_manager.download_and_extract('http://data.org/data.zip')
# dl_manager returns pathlib-like objects with `path.read_text()`,
# `path.iterdir()`,...
return {
'train': self._generate_examples(path=extracted_path / 'train_images'),
'test': self._generate_examples(path=extracted_path / 'test_images'),
}
def _generate_examples(self, path) -> Iterator[Tuple[Key, Example]]:
"""Generator of examples for each split."""
for img_path in path.glob('*.jpeg'):
# Yields (key, example)
yield img_path.name, {
'image': img_path,
'label': 'yes' if img_path.name.startswith('yes_') else 'no',
}
If you want to see your datasets in TFDS you have to put your data in your GCS bucket or on your drive, but if you want that your data can be downloaded using some registration or password then your data also be implemented in TFDS but it was not directly downloaded and extracted by tfds data pipelines, user have to follow manually downloading instructions as given here.
Also there is a public GCS bucket of tfds in which you can also upload your data but before it you have to contact with members of TFDS (as they have to check if it's good to upload your data in tfds gcs), by putting data on TFDS GCS helps a lot it avoid some issues of bad servers as well as user can directly load your data using tfds.load API.

TFX StatisticsGen for image data

Hi I've trying to get a TFX Pipeline going just as an exercise really. I'm using ImportExampleGen to load TFRecords from disk. Each Example in the TFRecord contains a jpg in the form of a byte string, height, width, depth, steering and throttle labels.
I'm trying to use StatisticsGen but I'm receiving this warning;
WARNING:root:Feature "image_raw" has bytes value "None" which cannot be decoded as a UTF-8 string. and crashing my Colab Notebook. As far as I can tell all the byte-string images in the TFRecord are not corrupt.
I cannot find concrete examples on StatisticsGen and handling image data. According to the docs Tensorflow Data Validation can deal with image data.
In addition to computing a default set of data statistics, TFDV can also compute statistics for semantic domains (e.g., images, text). To enable computation of semantic domain statistics, pass a tfdv.StatsOptions object with enable_semantic_domain_stats set to True to tfdv.generate_statistics_from_tfrecord.
But I'm not sure how this fits in with StatisticsGen.
Here is the code that instantiates an ImportExampleGen then the StatisticsGen
from tfx.utils.dsl_utils import tfrecord_input
from tfx.components.example_gen.import_example_gen.component import ImportExampleGen
from tfx.proto import example_gen_pb2
examples = tfrecord_input(_tf_record_dir)
# https://www.tensorflow.org/tfx/guide/examplegen#custom_inputoutput_split
# has a good explanation of splitting the data the 'output_config' param
# Input train split is _tf_record_dir/*'
# Output 2 splits: train:eval=8:2.
train_ratio = 8
eval_ratio = 10-train_ratio
output = example_gen_pb2.Output(
split_config=example_gen_pb2.SplitConfig(splits=[
example_gen_pb2.SplitConfig.Split(name='train',
hash_buckets=train_ratio),
example_gen_pb2.SplitConfig.Split(name='eval',
hash_buckets=eval_ratio)
]))
example_gen = ImportExampleGen(input=examples,
output_config=output)
context.run(example_gen)
statistics_gen = StatisticsGen(
examples=example_gen.outputs['examples'])
context.run(statistics_gen)
Thanks in advance.
From git issue response
Thanks Evan Rosen
Hi Folks,
The warnings you are seeing indicate that StatisticsGen is trying to treat your raw image features like a categorical string feature. The image bytes are being decoded just fine. The issue is that when the stats (including top K examples) are being written, the output proto is expecting a UTF-8 valid string, but instead gets the raw image bytes. Nothing is wrong with your setups from what I can tell, but this is just an unintended side-effect of a well-intentioned warning in the event that you have a categorical string feature which can't be serialized. We'll look into finding a better default that handles image data more elegantly.
In the meantime, to tell StatisticsGen that this feature is really an opaque blob, you can pass in a user-modified schema as described in the StatsGen docs. To generate this schema, you can run StatisticsGen and SchemaGen once (on a sample of data) and then modify the inferred schema to annotate that image features. Here is a modified version of the colab from #tall-josh:
Open In Colab
The additional steps are a bit verbose, but having a curated schema is often a good practice for other reasons. Here is the cell that I added to the notebook:
from google.protobuf import text_format
from tensorflow.python.lib.io import file_io
from tensorflow_metadata.proto.v0 import schema_pb2
# Load autogenerated schema (using stats from small batch)
schema = tfx.utils.io_utils.SchemaReader().read(
tfx.utils.io_utils.get_only_uri_in_dir(
tfx.types.artifact_utils.get_single_uri(schema_gen.outputs['schema'].get())))
# Modify schema to indicate which string features are images.
# Ideally you would persist a golden version of this schema somewhere rather
# than regenerating it on every run.
for feature in schema.feature:
if feature.name == 'image/raw':
feature.image_domain.SetInParent()
# Write modified schema to local file
user_schema_dir ='/tmp/user-schema/'
tfx.utils.io_utils.write_pbtxt_file(
os.path.join(user_schema_dir, 'schema.pbtxt'), schema)
# Create ImportNode to make modified schema available to other components
user_schema_importer = tfx.components.ImporterNode(
instance_name='import_user_schema',
source_uri=user_schema_dir,
artifact_type=tfx.types.standard_artifacts.Schema)
# Run the user schema ImportNode
context.run(user_schema_importer)
Hopefully you find this workaround is useful. In the meantime, we'll take a look at a better default experience for image-valued features.
Groked this and found the solution to be dramatically simpler than i thought...
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext
import logging
...
logger = logging.getLogger()
logger.setLevel(logging.CRITICAL)
...
context = InteractiveContext(pipeline_name='my_pipe')
...
c = StatisticsGen(...)
...
context.run(c)