Changing label name when retraining Inception on Google Cloud ML - tensorflow

I currently follow the tutorial to retrain Inception for image classification:
https://cloud.google.com/blog/big-data/2016/12/how-to-train-and-classify-images-using-google-cloud-machine-learning-and-cloud-dataflow
However, when I make a prediction with the API I get only the index of my class as a label. However I would like that the API actually gives me a string back with the actual class name e.g instead of
​predictions:
- key: '0'
prediction: 4
scores:
- 8.11998e-09
- 2.64907e-08
- 1.10307e-06
I would like to get:
​predictions:
- key: '0'
prediction: ROSES
scores:
- 8.11998e-09
- 2.64907e-08
- 1.10307e-06
Looking at the reference for the Google API it should be possible:
https://cloud.google.com/ml-engine/reference/rest/v1/projects/predict
I already tried to change in the model.py the following to
outputs = {
'key': keys.name,
'prediction': tensors.predictions[0].name,
'scores': tensors.predictions[1].name
}
tf.add_to_collection('outputs', json.dumps(outputs))
to
if tensors.predictions[0].name == 0:
pred_name ='roses'
elif tensors.predictions[0].name == 1:
pred_name ='tulips'
outputs = {
'key': keys.name,
'prediction': pred_name,
'scores': tensors.predictions[1].name
}
tf.add_to_collection('outputs', json.dumps(outputs))
but this doesn't work.
My next idea was to change this part in the preprocess.py file. So instead getting the index I want to use the string label.
def process(self, row, all_labels):
try:
row = row.element
except AttributeError:
pass
if not self.label_to_id_map:
for i, label in enumerate(all_labels):
label = label.strip()
if label:
self.label_to_id_map[label] = label #i
and
label_ids = []
for label in row[1:]:
try:
label_ids.append(label.strip())
#label_ids.append(self.label_to_id_map[label.strip()])
except KeyError:
unknown_label.inc()
but this gives the error:
TypeError: 'roses' has type <type 'str'>, but expected one of: (<type 'int'>, <type 'long'>) [while running 'Embed and make TFExample']
hence I thought that I should change something here in preprocess.py, in order to allow strings:
example = tf.train.Example(features=tf.train.Features(feature={
'image_uri': _bytes_feature([uri]),
'embedding': _float_feature(embedding.ravel().tolist()),
}))
if label_ids:
label_ids.sort()
example.features.feature['label'].int64_list.value.extend(label_ids)
But I don't know how to change it appropriately as I could not find someting like str_list. Could anyone please help me out here?

Online prediction certainly allows this, the model itself needs to be updated to do the conversion from int to string.
Keep in mind that the Python code is just building a graph which describes what computation to do in your model -- you're not sending the Python code to online prediction, you're sending the graph you build.
That distinction is important because the changes you have made are in Python -- you don't yet have any inputs or predictions, so you won't be able to inspect their values. What you need to do instead is add the equivalent lookups to the graph that you're exporting.
You could modify the code like so:
labels = tf.constant(['cars', 'trucks', 'suvs'])
predicted_indices = tf.argmax(softmax, 1)
prediction = tf.gather(labels, predicted_indices)
And leave the inputs/outputs untouched from the original code

Related

Tensorflow v2.10 mutate output of signature function to be a map of label to results

I'm trying to save my model so that when called from tf-serving the output is:
{
"results": [
{ "label1": x.xxxxx, "label2": x.xxxxx },
{ "label1": x.xxxxx, "label2": x.xxxxx }
]
}
where label1 and label2 are my labels and x.xxxxx are the probability of that label.
This is what I'm trying:
class TFModel(tf.Module):
def __init__(self, model: tf.keras.Model) -> None:
self.labels = ['label1', 'label2']
self.model = model
#tf.function(input_signature=[tf.TensorSpec(shape=(1, ), dtype=tf.string)])
def prediction(self, pagetext: str):
return
{ 'results': tf.constant([{k: v for dct in [{self.labels[c]: f"{x:.5f}"} for (c,x) in enumerate(results[i])] for k, v in dct.items()}
for i in range(len(results.numpy()))])}
# and then save it:
tf_model_wrapper = TFModel(classifier_model)
tf.saved_model.save(tf_model_wrapper.model,
saved_model_path,
signatures={'serving_default':tf_model_wrapper.prediction}
)
Side Note: Apparently in TensorFlow v2.0 if signatures is omitted it should scan the object for the first #tf.function (according to this: https://www.tensorflow.org/api_docs/python/tf/saved_model/save) but in reality that doesn't seem to work. Instead, the model saves successfully with no errors and the #tf.function is not called, but default output is returned instead.
The error I get from the above is:
ValueError: Got a non-Tensor value <tf.Operation 'PartitionedCall' type=PartitionedCall> for key 'output_0' in the output of the function __inference_prediction_125493 used to generate the SavedModel signature 'serving_default'. Outputs for functions used as signatures must be a single Tensor, a sequence of Tensors, or a dictionary from string to Tensor.
I wrapped the result in tf.constant above because of this error, thinking it might be a quick fix, but I think it's me just being naive and not understanding Tensors properly.
I tried a bunch of other things before learning that [all outputs must be return values].1
How can I change the output to be as I want it to be?
You can see a Tensor as a multidimensional vector, i.e a structure with a fixed size and dimension and containing elements sharing the same type. Your return value is a map between a string and a list of dictionaries. A list of dictionaries cannot be converted to a tensor, because there is no guarantee that the number of dimensions and their size is constant, nor a guarantee that each element is sharing the same type.
You could instead return the raw output of your network, which should be a tensor and do your post processing outside of tensorflow-serving.
If you really want to do something like in your question, you can use a Tensor of strings instead, and you could use some code like that:
labels = tf.constant(['label1', 'label2'])
# if your batch size is dynamic, you can use tf.shape on your results variable to find it at runtime
batch_size = 32
# assuming your model returns something with the shape (N,2)
results = tf.random.uniform((batch_size,2))
res_as_str = tf.strings.as_string(results, precision=5)
return {
"results": tf.stack(
[tf.tile(labels[None, :], [batch_size, 1]), res_as_str], axis=-1
)
}
The output will be a dictionary mapping the value "results" to a Tensor of dimensions (Batch, number of labels, 2), the last dimension containing the label name and its corresponding value.

TypeError: 'Value' object is not iterable : iterate around a Dataframe for prediction purpose with GCP Natural Language Model

I'm trying to iterate over a dataframe in order to apply a predict function, which calls a Natural Language Model located on GCP. Here is the loop code :
model = 'XXXXXXXXXXXXXXXX'
barometre_df_processed = barometre_df
barometre_df_processed['theme'] = ''
barometre_df_processed['proba'] = ''
print('DEBUT BOUCLE FOR')
for ind in barometre_df.index:
if barometre_df.verbatim[ind] is np.nan :
barometre_df_processed.theme[ind]="RAS"
barometre_df_processed.proba[ind]="1"
else:
print(barometre_df.verbatim[ind])
print(type(barometre_df.verbatim[ind]))
res = get_prediction(file_path={'text_snippet': {'content': barometre_df.verbatim[ind]},'mime_type': 'text/plain'} },model_name=model)
print(res)
theme = res['displayNames']
proba = res["classification"]["score"]
barometre_df_processed.theme[ind]=theme
barometre_df_processed.proba[ind]=proba
and the get_prediction function that I took from the Natural Language AI Documentation :
def get_prediction(file_path, model_name):
options = ClientOptions(api_endpoint='eu-automl.googleapis.com:443')
prediction_client = automl_v1.PredictionServiceClient(client_options=options)
payload = file_path
# Uncomment the following line (and comment the above line) if want to predict on PDFs.
# payload = pdf_payload(file_path)
parameters_dict = {}
params = json_format.ParseDict(parameters_dict, Value())
request = prediction_client.predict(name=model_name, payload=payload, params=params)
print("fonction prediction")
print(request)
return resultat[0]["displayName"], resultat[0]["classification"]["score"], resultat[1]["displayName"], resultat[1]["classification"]["score"], resultat[2]["displayName"], resultat[2]["classification"]["score"]
I'm doing a loop this way because I want each of my couple [displayNames, score] to create a new line on my final dataframe, to have something like this :
verbatim1, theme1, proba1
verbatim1, theme2, proba2
verbatim1, theme3, proba3
verbatim2, theme1, proba1
verbatim2, theme2, proba2
...
The if barometre_df.verbatim[ind] is np.nan is not causing problems, I just use it to deal with nans, don't take care of it.
The error that I have is this one :
TypeError: 'Value' object is not iterable
I guess the issues is about
res = get_prediction(file_path={'text_snippet': {'content': barometre_df.verbatim[ind]} },model_name=model)
but I can't figure what's goign wrong here.
I already try to remove
,'mime_type': 'text/plain'}
from my get_prediction parameters, but it doesn't change anything.
Does someone knows how to deal with this issue ?
Thank you already.
I think you are not iterating correctly.
The way to iterate through a dataframe is:
for index, row in df.iterrows():
print(row['col1'])

AttributeError: 'Tensor' object has no attribute 'numpy' eager execution is enabled using version 2.4.1

I've been trying to convert a generator I built to a tf.data.dataset.
I've come far and now I have something simple like this
def parse_image(filename):
file = tf.io.read_file(filename) # this will work only with filename as tensor
image = tf.image.decode_image(file)
return image
def transform_img(img):
img = parse_image(img).numpy()
img = transforms_train(image = img)["image"]
return img
transform img works as expected when I call it on a filename itself. like:
plt.imshow(transform_img(array_of_filenames[0]))
but when I map it on a dataset
dataset = tf.data.Dataset.from_tensor_slices(array_of_filenames)
dataset = dataset.map(transform_img)
I get the error in the title.
I am doing something silly again aren't I?
Thanks for helping!
It is not possible to use numpy inside the map function of tensorflow dataset. Otherwise, you need to wrap the function in tf.py_function or tf.numpy_function. So it should look like the following:
dataset = dataset.map(lambda: item: tf.py_function(transform_img, [item], [tf.float32]))
The first argument of py_function is the preprocessing function you want, the second argument is the parameter to pass to the function. The final argument is the dtype of the return of preprocess function. (same applies to tf.numpy_function)
I don't remember reading this in documentation but in a tutorial, you can find it here.

Saving TFrecords with TPU

I'm trying to use tf.data.experimental.TFRecordWriter to save dataset on Google cloud bucket using TPU. The code from the example in documentation works:
dataset = tf.data.Dataset.range(3)
dataset = dataset.map(tf.io.serialize_tensor)
writer = tf.data.experimental.TFRecordWriter("gs://oleg-zyablov/test.tfrec")
writer.write(dataset)
But I have dataset of tuples (string, int64), where first is jpg-encoded image and second is label. When I pass it to writer.write() method, it says: 'tuple' object has no attribute 'is_compatible_with'.
I guess I have to pack image and label into tf.train.Example to make it work. I use the following code:
def serialize(image, class_idx):
tfrecord = tf.train.Example(features = tf.train.Features(feature = {
'image': tf.train.Feature(bytes_list = tf.train.BytesList(value = [image.numpy()])),
'class': tf.train.Feature(int64_list = tf.train.Int64List(value = [class_idx.numpy()]))
}))
return tfrecord.SerializeToString()
#saving_pipeline is a dataset of (string, int64) tuples
saving_pipeline_serialized = saving_pipeline.map(serialize)
writer = tf.data.experimental.TFRecordWriter("gs://oleg-zyablov/car-classification/train_tfrecords/test.tfrecord")
writer.write(saving_pipeline_serialized)
But I get the following error:
'Tensor' object has no attribute 'numpy'
Although I didn't turn off eager mode and this code tf.constant([], dtype = float).numpy() works. Maybe TPU works not in eager mode? Ok, I changed .numpy() to .eval() in the code above. Then I get the foloowing error:
Cannot evaluate tensor using `eval()`: No default session is registered. Use `with sess.as_default()` or pass an explicit session to `eval(session=sess)`
What session does TPU use and how do I specify it? When I run the code below:
with tf.compat.v1.Session():
saving_pipeline_serialized = saving_pipeline.map(serialize)
I get an error:
Cannot use the default session to evaluate tensor: the tensor's graph is different from the session's graph. Pass an explicit session to `eval(session=sess)`.
But I don't know how to get the current graph and pass it to tf.compat.v1.Session(). When I go another way and type:
image.eval(session = tf.compat.v1.get_default_session())
It says:
Cannot evaluate tensor using `eval()`: No default session is registered. Use `with sess.as_default()` or pass an explicit session to `eval(session=sess)`
Is it possible to use .eval() on TPU? Or how do I perform my task another way?

How to read parameters of layers of .tflite model in python

I was trying to read tflite model and pull all the parameters of the layers out.
My steps:
I generated flatbuffers model representation by running (please build flatc before):
flatc -python tensorflow/tensorflow/lite/schema/schema.fbs
Result is tflite/ folder that contains layer description files (*.py) and some utilitarian files.
I successfully loaded model:
in case of import Error: set PYTHONPATH to point to the folder where tflite/ is
from tflite.Model import Model
def read_tflite_model(file):
buf = open(file, "rb").read()
buf = bytearray(buf)
model = Model.GetRootAsModel(buf, 0)
return model
I partly pulled model and node parameters out and stacked in iterating over nodes:
Model part:
def print_model_info(model):
version = model.Version()
print("Model version:", version)
description = model.Description().decode('utf-8')
print("Description:", description)
subgraph_len = model.SubgraphsLength()
print("Subgraph length:", subgraph_len)
Nodes part:
def print_nodes_info(model):
# what does this 0 mean? should it always be zero?
subgraph = model.Subgraphs(0)
operators_len = subgraph.OperatorsLength()
print('Operators length:', operators_len)
from collections import deque
nodes = deque(subgraph.InputsAsNumpy())
STEP_N = 0
MAX_STEPS = operators_len
print("Nodes info:")
while len(nodes) != 0 and STEP_N <= MAX_STEPS:
print("MAX_STEPS={} STEP_N={}".format(MAX_STEPS, STEP_N))
print("-" * 60)
node_id = nodes.pop()
print("Node id:", node_id)
tensor = subgraph.Tensors(node_id)
print("Node name:", tensor.Name().decode('utf-8'))
print("Node shape:", tensor.ShapeAsNumpy())
# which type is it? what does it mean?
type_of_tensor = tensor.Type()
print("Tensor type:", type_of_tensor)
quantization = tensor.Quantization()
min = quantization.MinAsNumpy()
max = quantization.MaxAsNumpy()
scale = quantization.ScaleAsNumpy()
zero_point = quantization.ZeroPointAsNumpy()
print("Quantization: ({}, {}), s={}, z={}".format(min, max, scale, zero_point))
# I do not understand it again. what is j, that I set to 0 here?
operator = subgraph.Operators(0)
for i in operator.OutputsAsNumpy():
nodes.appendleft(i)
STEP_N += 1
print("-"*60)
Please point me to documentation or some example of using this API.
My problems are:
I can not get documentation on this API
Iterating over Tensor objects seems not possible for me, as it doesn't have Inputs and Outputs methods. + subgraph.Operators(j=0) I do not understand what j means in here. Because of that my cycle goes through two nodes: input (once) and the next one over and over again.
Iterating over Operator objects is surely possible:
Here we iterate over them all but I can not get how to map Operator and Tensor.
def print_in_out_info_of_all_operators(model):
# what does this 0 mean? should it always be zero?
subgraph = model.Subgraphs(0)
for i in range(subgraph.OperatorsLength()):
operator = subgraph.Operators(i)
print('Outputs', operator.OutputsAsNumpy())
print('Inputs', operator.InputsAsNumpy())
I do not understand how to pull parameters out Operator object. BuiltinOptions method gives me Table object, that I do not know what to map at.
subgraph = model.Subgraphs(0)
What does this 0 mean? should it always be zero? obviously no, but what is it? Id of the subgraph? If so - I'm happy. If no, please try to explain it.