Saving TFrecords with TPU - tensorflow

I'm trying to use tf.data.experimental.TFRecordWriter to save dataset on Google cloud bucket using TPU. The code from the example in documentation works:
dataset = tf.data.Dataset.range(3)
dataset = dataset.map(tf.io.serialize_tensor)
writer = tf.data.experimental.TFRecordWriter("gs://oleg-zyablov/test.tfrec")
writer.write(dataset)
But I have dataset of tuples (string, int64), where first is jpg-encoded image and second is label. When I pass it to writer.write() method, it says: 'tuple' object has no attribute 'is_compatible_with'.
I guess I have to pack image and label into tf.train.Example to make it work. I use the following code:
def serialize(image, class_idx):
tfrecord = tf.train.Example(features = tf.train.Features(feature = {
'image': tf.train.Feature(bytes_list = tf.train.BytesList(value = [image.numpy()])),
'class': tf.train.Feature(int64_list = tf.train.Int64List(value = [class_idx.numpy()]))
}))
return tfrecord.SerializeToString()
#saving_pipeline is a dataset of (string, int64) tuples
saving_pipeline_serialized = saving_pipeline.map(serialize)
writer = tf.data.experimental.TFRecordWriter("gs://oleg-zyablov/car-classification/train_tfrecords/test.tfrecord")
writer.write(saving_pipeline_serialized)
But I get the following error:
'Tensor' object has no attribute 'numpy'
Although I didn't turn off eager mode and this code tf.constant([], dtype = float).numpy() works. Maybe TPU works not in eager mode? Ok, I changed .numpy() to .eval() in the code above. Then I get the foloowing error:
Cannot evaluate tensor using `eval()`: No default session is registered. Use `with sess.as_default()` or pass an explicit session to `eval(session=sess)`
What session does TPU use and how do I specify it? When I run the code below:
with tf.compat.v1.Session():
saving_pipeline_serialized = saving_pipeline.map(serialize)
I get an error:
Cannot use the default session to evaluate tensor: the tensor's graph is different from the session's graph. Pass an explicit session to `eval(session=sess)`.
But I don't know how to get the current graph and pass it to tf.compat.v1.Session(). When I go another way and type:
image.eval(session = tf.compat.v1.get_default_session())
It says:
Cannot evaluate tensor using `eval()`: No default session is registered. Use `with sess.as_default()` or pass an explicit session to `eval(session=sess)`
Is it possible to use .eval() on TPU? Or how do I perform my task another way?

Related

AttributeError: 'Tensor' object has no attribute 'numpy' eager execution is enabled using version 2.4.1

I've been trying to convert a generator I built to a tf.data.dataset.
I've come far and now I have something simple like this
def parse_image(filename):
file = tf.io.read_file(filename) # this will work only with filename as tensor
image = tf.image.decode_image(file)
return image
def transform_img(img):
img = parse_image(img).numpy()
img = transforms_train(image = img)["image"]
return img
transform img works as expected when I call it on a filename itself. like:
plt.imshow(transform_img(array_of_filenames[0]))
but when I map it on a dataset
dataset = tf.data.Dataset.from_tensor_slices(array_of_filenames)
dataset = dataset.map(transform_img)
I get the error in the title.
I am doing something silly again aren't I?
Thanks for helping!
It is not possible to use numpy inside the map function of tensorflow dataset. Otherwise, you need to wrap the function in tf.py_function or tf.numpy_function. So it should look like the following:
dataset = dataset.map(lambda: item: tf.py_function(transform_img, [item], [tf.float32]))
The first argument of py_function is the preprocessing function you want, the second argument is the parameter to pass to the function. The final argument is the dtype of the return of preprocess function. (same applies to tf.numpy_function)
I don't remember reading this in documentation but in a tutorial, you can find it here.

Initialise tensors with ones in tensorflow 1

I want to initialise a variable declared using get_variable function with 1s .
I tried the following methods :
1.tf.get_variable(name = 'yd1', shape = shape_t, dtype = tf.float32,initializer = tf.ones())
Error received -> TypeError: ones() takes at least 1 argument (0 given)
tf.get_variable(name = 'yd1', shape = shape_t ,dtype = tf.float32,initializer = tf.ones(shape=shape_t))
Error received -> ValueError("If initializer is a constant, do not specify shape.")
What is the best way to initialise a variable with ones?
tf.zeros_initializer can be used to initialise with 0s, but there is no equivalent for ones in tf 1
You need to use tf.ones_initializer:
tf.get_variable(name='yd1', shape=shape_t, dtype=tf.float32,
initializer=tf.ones_initializer())
Alternatively, as the second error message says, you can use a constant value, but then do not pass a shape:
tf.get_variable(name='yd1', initializer=tf.ones(shape=shape_t, dtype=tf.float32))
tf.ones_initializer is not available in tf 1, It was introduced in tf 2.
The following code does the work
tf.get_variable(name = 'yd1', shape = shape_t ,dtype = tf.float32,initializer = tf.constant_initializer(1))

How to read a utf-8 encoded binary string in tensorflow?

I am trying to convert an encoded byte string back into the original array in the tensorflow graph (using tensorflow operations) in order to make a prediction in a tensorflow model. The array to byte conversion is based on this answer and it is the suggested input to tensorflow model prediction on google cloud's ml-engine.
def array_request_example(input_array):
input_array = input_array.astype(np.float32)
byte_string = input_array.tostring()
string_encoded_contents = base64.b64encode(byte_string)
return string_encoded_contents.decode('utf-8')}
Tensorflow code
byte_string = tf.placeholder(dtype=tf.string)
audio_samples = tf.decode_raw(byte_string, tf.float32)
audio_array = np.array([1, 2, 3, 4])
bstring = array_request_example(audio_array)
fdict = {byte_string: bstring}
with tf.Session() as sess:
[tf_samples] = sess.run([audio_samples], feed_dict=fdict)
I have tried using decode_raw and decode_base64 but neither return the original values.
I have tried setting the the out_type of decode raw to the different possible datatypes and tried altering what data type I am converting the original array to.
So, how would I read the byte array in tensorflow? Thanks :)
Extra Info
The aim behind this is to create the serving input function for a custom Estimator to make predictions using gcloud ml-engine local predict (for testing) and using the REST API for the model stored on the cloud.
The serving input function for the Estimator is
def serving_input_fn():
feature_placeholders = {'b64': tf.placeholder(dtype=tf.string,
shape=[None],
name='source')}
audio_samples = tf.decode_raw(feature_placeholders['b64'], tf.float32)
# Dummy function to save space
power_spectrogram = create_spectrogram_from_audio(audio_samples)
inputs = {'spectrogram': power_spectrogram}
return tf.estimator.export.ServingInputReceiver(inputs, feature_placeholders)
Json request
I use .decode('utf-8') because when attempting to json dump the base64 encoded byte strings I receive this error
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: b'longbytestring'
Prediction Errors
When passing the json request {'audio_bytes': 'b64': bytestring} with gcloud local I get the error
PredictionError: Invalid inputs: Expected tensor name: b64, got tensor name: [u'audio_bytes']
So perhaps google cloud local predict does not automatically handle the audio bytes and base64 conversion? Or likely somethings wrong with my Estimator setup.
And the request {'instances': [{'audio_bytes': 'b64': bytestring}]} to REST API gives
{'error': 'Prediction failed: Error during model execution: AbortionError(code=StatusCode.INVALID_ARGUMENT, details="Input to DecodeRaw has length 793713 that is not a multiple of 4, the size of float\n\t [[Node: DecodeRaw = DecodeRaw[_output_shapes=[[?,?]], little_endian=true, out_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_source_0_0)]]")'}
which confuses me as I explicitly define the request to be a float and do the same in the serving input receiver.
Removing audio_bytes from the request and utf-8 encoding the byte strings allows me to get predictions, though in testing the decoding locally, I think the audio is being incorrectly converted from the byte string.
The answer that you referenced, is written assuming you are running the model on CloudML Engine's service. The service actually takes care of the JSON (including UTF-8) and base64 encoding.
To get your code working locally or in another environment, you'll need the following changes:
def array_request_example(input_array):
input_array = input_array.astype(np.float32)
return input_array.tostring()
byte_string = tf.placeholder(dtype=tf.string)
audio_samples = tf.decode_raw(byte_string, tf.float32)
audio_array = np.array([1, 2, 3, 4])
bstring = array_request_example(audio_array)
fdict = {byte_string: bstring}
with tf.Session() as sess:
tf_samples = sess.run([audio_samples], feed_dict=fdict)
That said, based on your code, I suspect you are looking to send data as JSON; you can use gcloud local predict to simulate CloudML Engine's service. Or, if you prefer to write your own code, perhaps something like this:
def array_request_examples,(input_arrays):
"""input_arrays is a list (batch) of np_arrays)"""
input_arrays = (a.astype(np.float32) for a in input_arrays)
# Convert each image to byte strings
bytes_strings = (a.tostring() for a in input_arrays)
# Base64 encode the data
encoded = (base64.b64encode(b) for b in bytes_strings)
# Create a list of images suitable to send to the service as JSON:
instances = [{'audio_bytes': {'b64': e}} for e in encoded]
# Create a JSON request
return json.dumps({'instances': instances})
def parse_request(request):
# non-TF to simulate the CloudML Service which does not expect
# this to be in the submitted graphs.
instances = json.loads(request)['instances']
return [base64.b64decode(i['audio_bytes']['b64']) for i in instances]
byte_strings = tf.placeholder(dtype=tf.string, shape=[None])
decode = lambda raw_byte_str: tf.decode_raw(raw_byte_str, tf.float32)
audio_samples = tf.map_fn(decode, byte_strings, dtype=tf.float32)
audio_array = np.array([1, 2, 3, 4])
request = array_request_examples([audio_array])
fdict = {byte_strings: parse_request(request)}
with tf.Session() as sess:
tf_samples = sess.run([audio_samples], feed_dict=fdict)

Correct format for input data on CloudML

I'm trying to send a job up to my object detection model on CloudML to get predictions. I'm following the guide at https://cloud.google.com/ml-engine/docs/online-predict but I'm getting an error when submitting the request:
RuntimeError: Prediction failed: Error processing input: Expected uint8, got '\xf6>\x00\x01\x04\xa4d\x94...(more bytes)...\x00\x10\x10\x10\x04\x80\xd9' of type 'str' instead.
This is my code:
img = base64.b64encode(open("file.jpg", "rb").read()).decode('utf-8')
json = {"b64": img}
result = predict_json(project, model, json, "v1")
My fault, I forgot to add --input_type encoded_image_string_tensor when I exported the graph.

Changing label name when retraining Inception on Google Cloud ML

I currently follow the tutorial to retrain Inception for image classification:
https://cloud.google.com/blog/big-data/2016/12/how-to-train-and-classify-images-using-google-cloud-machine-learning-and-cloud-dataflow
However, when I make a prediction with the API I get only the index of my class as a label. However I would like that the API actually gives me a string back with the actual class name e.g instead of
​predictions:
- key: '0'
prediction: 4
scores:
- 8.11998e-09
- 2.64907e-08
- 1.10307e-06
I would like to get:
​predictions:
- key: '0'
prediction: ROSES
scores:
- 8.11998e-09
- 2.64907e-08
- 1.10307e-06
Looking at the reference for the Google API it should be possible:
https://cloud.google.com/ml-engine/reference/rest/v1/projects/predict
I already tried to change in the model.py the following to
outputs = {
'key': keys.name,
'prediction': tensors.predictions[0].name,
'scores': tensors.predictions[1].name
}
tf.add_to_collection('outputs', json.dumps(outputs))
to
if tensors.predictions[0].name == 0:
pred_name ='roses'
elif tensors.predictions[0].name == 1:
pred_name ='tulips'
outputs = {
'key': keys.name,
'prediction': pred_name,
'scores': tensors.predictions[1].name
}
tf.add_to_collection('outputs', json.dumps(outputs))
but this doesn't work.
My next idea was to change this part in the preprocess.py file. So instead getting the index I want to use the string label.
def process(self, row, all_labels):
try:
row = row.element
except AttributeError:
pass
if not self.label_to_id_map:
for i, label in enumerate(all_labels):
label = label.strip()
if label:
self.label_to_id_map[label] = label #i
and
label_ids = []
for label in row[1:]:
try:
label_ids.append(label.strip())
#label_ids.append(self.label_to_id_map[label.strip()])
except KeyError:
unknown_label.inc()
but this gives the error:
TypeError: 'roses' has type <type 'str'>, but expected one of: (<type 'int'>, <type 'long'>) [while running 'Embed and make TFExample']
hence I thought that I should change something here in preprocess.py, in order to allow strings:
example = tf.train.Example(features=tf.train.Features(feature={
'image_uri': _bytes_feature([uri]),
'embedding': _float_feature(embedding.ravel().tolist()),
}))
if label_ids:
label_ids.sort()
example.features.feature['label'].int64_list.value.extend(label_ids)
But I don't know how to change it appropriately as I could not find someting like str_list. Could anyone please help me out here?
Online prediction certainly allows this, the model itself needs to be updated to do the conversion from int to string.
Keep in mind that the Python code is just building a graph which describes what computation to do in your model -- you're not sending the Python code to online prediction, you're sending the graph you build.
That distinction is important because the changes you have made are in Python -- you don't yet have any inputs or predictions, so you won't be able to inspect their values. What you need to do instead is add the equivalent lookups to the graph that you're exporting.
You could modify the code like so:
labels = tf.constant(['cars', 'trucks', 'suvs'])
predicted_indices = tf.argmax(softmax, 1)
prediction = tf.gather(labels, predicted_indices)
And leave the inputs/outputs untouched from the original code