Tensorflow 2 custom dataset Sequence - tensorflow2.0

I have a dataset in a python dictionary. The structure is as follow:
data.data['0']['input'],data.data['0']['target'],data.data['0']['length']
Both input and target are arrays of size (n,) and length is an int.
I have created a class object with tf.keras.utils.Sequence and specify __getitem__ as this:
def __getitem__(self, idx):
idx = str(idx)
return {
'input': np.asarray(self.data[idx]['input']),
'target': np.asarray(self.data[idx]['target']),
'length': self.data[idx]['length']
}
How can I iterate over such dataset using tf.data.Dataset? I am getting this error if I try to use from_tensor_slices
ValueError: Attempt to convert a value with an unsupported type (<class 'dict'>) to a Tensor.

I think you should modify the dictionary to a tensor as proposed here convert a dictionary to a tensor
or change the dictionary to a text file or to a tfrecords. Hope this would help you!

Related

convert tf.dense Tensor to tf.one_hot Tensor on Graph execution Tensorflow

TF version: 2.11
I try to train a simple 2input classifier with TFRecords tf.data pipeline
I do not manage to convert the tf.dense Tensor with containing only a scalar to a tf.onehot vector
# get all recorddatasets abspath
training_names= [record_path+'/'+rec for rec in os.listdir(record_path) if rec.startswith('train')]
# load in tf dataset
train_dataset = tf.data.TFRecordDataset(training_names[1])
train_dataset = train_dataset.map(return_xy)
mapping function:
def return_xy(example_proto):
#parse example
sample= parse_function(example_proto)
#decode image 1
encoded_image1 = sample['image/encoded_1']
decoded_image1 = decode_image(encoded_image1)
#decode image 2
encoded_image2 = sample['image/encoded_2']
decoded_image2 = decode_image(encoded_image2)
#decode label
print(f'image/object/class/'+level: {sample['image/object/class/'+level]}')
class_label = tf.sparse.to_dense(sample['image/object/class/'+level])
print(f'type of class label :{type(class_label)}')
print(class_label)
# conversion to onehot with depth 26 :: -> how can i extract only the value or convert directly to tf.onehot??
label_onehot=tf.one_hot(class_label,26)
#resizing image
input_left=tf.image.resize(decoded_image1,[416, 416])
input_right=tf.image.resize(decoded_image2,[416, 416])
return {'input_3res1':input_left, 'input_5res2':input_right} , label_onehot
output:
image/object/class/'+level: SparseTensor(indices=Tensor("ParseSingleExample/ParseExample/ParseExampleV2:14", shape=(None, 1), dtype=int64), values=Tensor("ParseSingleExample/ParseExample/ParseExampleV2:31", shape=(None,), dtype=int64), dense_shape=Tensor("ParseSingleExample/ParseExample/ParseExampleV2:48", shape=(1,), dtype=int64))
type of class label :<class 'tensorflow.python.framework.ops.Tensor'>
Tensor("SparseToDense:0", shape=(None,), dtype=int64)
However I am sure that the label is in this Tensor because when run it eagerly
raw_dataset = tf.data.TFRecordDataset([rec_file])
parsed_dataset = raw_dataset.map(parse_function) # only parsing
for sample in parsed_dataset:
class_label=tf.sparse.to_dense(sample['image/object/class/label_level3'])[0]
print(f'type of class label :{type(class_label)}')
print(f'labels from labelmap :{class_label}')
I get output:
type of class label :<class 'tensorflow.python.framework.ops.EagerTensor'>
labels from labelmap :7
If I just chose a random number for the label and pass it to tf_one_hot(randint, 26) then the model begins to train (obviously nonsensical).
So the question is how can i convert the:
Tensor("SparseToDense:0", shape=(None,), dtype=int64)
to a
Tensor("one_hot:0", shape=(26,), dtype=float32)
What I tried so far
in the call data.map(parse_xy)
i tried to just call .numpy() on the tf tensors but didnt work , this only works for eager tensors.
In my understanding i cannot use eager execution because everthing in the parse_xy function gets excecuted on the whole graph:
ive already tried to enable eager execution -> failed
https://www.tensorflow.org/api_docs/python/tf/config/run_functions_eagerly
Note: This flag has no effect on functions passed into tf.data transformations as arguments.
tf.data functions are never executed eagerly and are always executed as a compiled Tensorflow Graph.
ive also tried to use the tf_pyfunc but this only returns another tf.Tensor with an unknown shape
def get_onehot(tensor):
class_label=tensor[0]
return tf.one_hot(class_label,26)
and add the line in parse_xy:
label_onehot=tf.py_function(func=get_onehot, inp=[class_label], Tout=tf.int64)
but there i always get an unknown shape which a cannot just alter with .set_shape()
I was able to solve the issue by only using TensorFlow functions.
tf.gather allows to index a TensorFlow tensor:
class_label_gather = tf.sparse.to_dense(sample['image/object/class/'+level])
class_indices = tf.gather(tf.cast(class_label_gather,dtype=tf.int32),0)
label_onehot=tf.one_hot(class_indices,26)

Tensorflow v2.10 mutate output of signature function to be a map of label to results

I'm trying to save my model so that when called from tf-serving the output is:
{
"results": [
{ "label1": x.xxxxx, "label2": x.xxxxx },
{ "label1": x.xxxxx, "label2": x.xxxxx }
]
}
where label1 and label2 are my labels and x.xxxxx are the probability of that label.
This is what I'm trying:
class TFModel(tf.Module):
def __init__(self, model: tf.keras.Model) -> None:
self.labels = ['label1', 'label2']
self.model = model
#tf.function(input_signature=[tf.TensorSpec(shape=(1, ), dtype=tf.string)])
def prediction(self, pagetext: str):
return
{ 'results': tf.constant([{k: v for dct in [{self.labels[c]: f"{x:.5f}"} for (c,x) in enumerate(results[i])] for k, v in dct.items()}
for i in range(len(results.numpy()))])}
# and then save it:
tf_model_wrapper = TFModel(classifier_model)
tf.saved_model.save(tf_model_wrapper.model,
saved_model_path,
signatures={'serving_default':tf_model_wrapper.prediction}
)
Side Note: Apparently in TensorFlow v2.0 if signatures is omitted it should scan the object for the first #tf.function (according to this: https://www.tensorflow.org/api_docs/python/tf/saved_model/save) but in reality that doesn't seem to work. Instead, the model saves successfully with no errors and the #tf.function is not called, but default output is returned instead.
The error I get from the above is:
ValueError: Got a non-Tensor value <tf.Operation 'PartitionedCall' type=PartitionedCall> for key 'output_0' in the output of the function __inference_prediction_125493 used to generate the SavedModel signature 'serving_default'. Outputs for functions used as signatures must be a single Tensor, a sequence of Tensors, or a dictionary from string to Tensor.
I wrapped the result in tf.constant above because of this error, thinking it might be a quick fix, but I think it's me just being naive and not understanding Tensors properly.
I tried a bunch of other things before learning that [all outputs must be return values].1
How can I change the output to be as I want it to be?
You can see a Tensor as a multidimensional vector, i.e a structure with a fixed size and dimension and containing elements sharing the same type. Your return value is a map between a string and a list of dictionaries. A list of dictionaries cannot be converted to a tensor, because there is no guarantee that the number of dimensions and their size is constant, nor a guarantee that each element is sharing the same type.
You could instead return the raw output of your network, which should be a tensor and do your post processing outside of tensorflow-serving.
If you really want to do something like in your question, you can use a Tensor of strings instead, and you could use some code like that:
labels = tf.constant(['label1', 'label2'])
# if your batch size is dynamic, you can use tf.shape on your results variable to find it at runtime
batch_size = 32
# assuming your model returns something with the shape (N,2)
results = tf.random.uniform((batch_size,2))
res_as_str = tf.strings.as_string(results, precision=5)
return {
"results": tf.stack(
[tf.tile(labels[None, :], [batch_size, 1]), res_as_str], axis=-1
)
}
The output will be a dictionary mapping the value "results" to a Tensor of dimensions (Batch, number of labels, 2), the last dimension containing the label name and its corresponding value.

Using static rnn getting TypeError: Cannot convert value None to a TensorFlow DType

First some of my code:
...
fc_1 = layers.Dense(256, activation='relu')(drop_reshape)
bi_LSTM_2 = layers.Lambda(buildGruLayer)(fc_1)
...
def buildGruLayer(inputs):
gru_cells = []
gru_cells.append(tf.contrib.rnn.GRUCell(256))
gru_cells.append(tf.contrib.rnn.GRUCell(128))
gru_layers = tf.keras.layers.StackedRNNCells(gru_cells)
inputs = tf.unstack(inputs, axis=1)
outputs, _ = tf.contrib.rnn.static_rnn(
gru_layers,
inputs,
dtype='float32')
return outputs
Error I am getting when running static_rnn is:
raise TypeError("Cannot convert value %r to a TensorFlow DType." % type_value)
TypeError: Cannot convert value None to a TensorFlow DType.
The shape that comes into the Layer in the shape (64,238,256).
Anyone has a clue what the problem could be. I already googled the error but couldn't find anything. Any help is much appreciated.
If anyone still needs a solution to this. Its because you need to specify the dtype for the GRUCell, e.g tf.float32
Its default is None which in the documentation defaults to the first dimension of your input data (i.e batch dimension, which in tensorflow is a ? or None)
Check the dtype argument from :
https://www.tensorflow.org/api_docs/python/tf/compat/v1/nn/rnn_cell/GRUCell

tensorflow record with float numpy array

I want to create tensorflow records to feed my model;
so far I use the following code to store uint8 numpy array to TFRecord format;
def _int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
def _bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def _floats_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))
def convert_to_record(name, image, label, map):
filename = os.path.join(params.TRAINING_RECORDS_DATA_DIR, name + '.' + params.DATA_EXT)
writer = tf.python_io.TFRecordWriter(filename)
image_raw = image.tostring()
map_raw = map.tostring()
label_raw = label.tostring()
example = tf.train.Example(features=tf.train.Features(feature={
'image_raw': _bytes_feature(image_raw),
'map_raw': _bytes_feature(map_raw),
'label_raw': _bytes_feature(label_raw)
}))
writer.write(example.SerializeToString())
writer.close()
which I read with this example code
features = tf.parse_single_example(example, features={
'image_raw': tf.FixedLenFeature([], tf.string),
'map_raw': tf.FixedLenFeature([], tf.string),
'label_raw': tf.FixedLenFeature([], tf.string),
})
image = tf.decode_raw(features['image_raw'], tf.uint8)
image.set_shape(params.IMAGE_HEIGHT*params.IMAGE_WIDTH*3)
image = tf.reshape(image_, (params.IMAGE_HEIGHT,params.IMAGE_WIDTH,3))
map = tf.decode_raw(features['map_raw'], tf.uint8)
map.set_shape(params.MAP_HEIGHT*params.MAP_WIDTH*params.MAP_DEPTH)
map = tf.reshape(map, (params.MAP_HEIGHT,params.MAP_WIDTH,params.MAP_DEPTH))
label = tf.decode_raw(features['label_raw'], tf.uint8)
label.set_shape(params.NUM_CLASSES)
and that's working fine. Now I want to do the same with my array "map" being a float numpy array, instead of uint8, and I could not find examples on how to do it;
I tried the function _floats_feature, which works if I pass a scalar to it, but not with arrays;
with uint8 the serialization can be done by the method tostring();
How can I serialize a float numpy array and how can I read that back?
FloatList and BytesList expect an iterable. So you need to pass it a list of floats. Remove the extra brackets in your _float_feature, ie
def _floats_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
numpy_arr = np.ones((3,)).astype(np.float)
example = tf.train.Example(features=tf.train.Features(feature={"bytes": _floats_feature(numpy_arr)}))
print(example)
features {
feature {
key: "bytes"
value {
float_list {
value: 1.0
value: 1.0
value: 1.0
}
}
}
}
I will expand on the Yaroslav's answer.
Int64List, BytesList and FloatList expect an iterator of the underlying elements (repeated field). In your case you can use a list as an iterator.
You mentioned: it works if I pass a scalar to it, but not with arrays. And this is expected, because when you pass a scalar, your _floats_feature creates an array of one float element in it (exactly as expected). But when you pass an array you create a list of arrays and pass it to a function which expects a list of floats.
So just remove construction of the array from your function: float_list=tf.train.FloatList(value=value)
I've stumbled across this while working on a similar problem. Since part of the original question was how to read back the float32 feature from tfrecords, I'll leave this here in case it helps anyone:
If map.ravel() was used to input map of dimensions [x, y, z] into _floats_feature:
features = {
...
'map': tf.FixedLenFeature([x, y, z], dtype=tf.float32)
...
}
parsed_example = tf.parse_single_example(serialized=serialized, features=features)
map = parsed_example['map']
Yaroslav's example failed when a nd array was the input:
numpy_arr = np.ones((3,3)).astype(np.float)
I found that it worked when I used numpy_arr.ravel() as the input. But is there a better way to do it?
First of all, many thanks to Yaroslav and Salvador for their enlightening answers.
According to my experience, their methods only works when the input is a 1D NumPy array as the size of (n, ). When the input is a Numpy array with the dimension of more than 2, the following error info appears:
def _float_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
numpy_arr = np.arange(12).reshape(2, 2, 3).astype(np.float)
example = tf.train.Example(features=tf.train.Features(feature={"bytes":
_float_feature(numpy_arr)}))
print(example)
TypeError: array([[0., 1., 2.],
[3., 4., 5.]]) has type numpy.ndarray, but expected one of: int, long, float
So, I'd like to expand on Tsuan's answer, that is, flattening the input before it was fed into the TF example. The modified code is as follows:
def _floats_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
numpy_arr = np.arange(12).reshape(2, 2, 3).astype(np.float).flatten()
example = tf.train.Example(features=tf.train.Features(feature={"bytes":
_float_feature(numpy_arr)}))
print(example)
In addition, np.flatten() is more applicable than np.ravel().
Use tfrmaker, a TFRecord utility package. You can install the package with pip:
pip install tfrmaker
Then you could create tfrecords like this:
from tfrmaker import images
# mapping label names with integer encoding.
LABELS = {"bishop": 0, "knight": 1, "pawn": 2, "queen": 3, "rook": 4}
# specifiying data and output directories.
DATA_DIR = "datasets/chess/"
OUTPUT_DIR = "tfrecords/chess/"
# create tfrecords from the images present in the given data directory.
info = images.create(DATA_DIR, LABELS, OUTPUT_DIR)
# info contains a list of information (path: releative path, size: no of images in the tfrecord) about created tfrecords
print(info)
The package also has some cool features like:
dynamic resizing
splitting tfrecords into optimal shards
spliting training, validation, testing of tfrecords
count no of images in tfrecords
asynchronous tfrecord creation
NOTE: This package currently supports image datasets that are organised as directories with class names as sub directory names.

How can I use tf.string_split() in tensorflow?

I want to get the extension of image files to invoke different image decoder, and I found there's a function called tf.string_split in tensorflow r0.11.
filename_queue = tf.train.string_input_producer(filenames, shuffle=shuffle)
reader = tf.WholeFileReader()
img_src, img_bytes = reader.read(filename_queue)
split_result = tf.string_split(img_src, '.')
But when I run it, I get this error:
ValueError: Shape must be rank 1 but is rank 0 for 'StringSplit' (op: 'StringSplit') with input shapes: [], [].
I think it may caused by the shape inference of img_src. I try to use img_src.set_shape([1,]) to fix it, but it seems not work, I get this error:
ValueError: Shapes () and (1,) are not compatible
Also, I can't get the shape of img_src using
tf.Print(split_result, [tf.shape(img_src)],'img_src shape=')
The result is img_src shape=[]. But if I use the following code:
tf.Print(split_result, [img_src],'img_src=')
The result is img_src=test_img/test1.png. Am I doing something wrong?
Just pack img_src into a tensor.
split_result = tf.string_split([img_src], '.')