How can I use tf.string_split() in tensorflow? - tensorflow

I want to get the extension of image files to invoke different image decoder, and I found there's a function called tf.string_split in tensorflow r0.11.
filename_queue = tf.train.string_input_producer(filenames, shuffle=shuffle)
reader = tf.WholeFileReader()
img_src, img_bytes = reader.read(filename_queue)
split_result = tf.string_split(img_src, '.')
But when I run it, I get this error:
ValueError: Shape must be rank 1 but is rank 0 for 'StringSplit' (op: 'StringSplit') with input shapes: [], [].
I think it may caused by the shape inference of img_src. I try to use img_src.set_shape([1,]) to fix it, but it seems not work, I get this error:
ValueError: Shapes () and (1,) are not compatible
Also, I can't get the shape of img_src using
tf.Print(split_result, [tf.shape(img_src)],'img_src shape=')
The result is img_src shape=[]. But if I use the following code:
tf.Print(split_result, [img_src],'img_src=')
The result is img_src=test_img/test1.png. Am I doing something wrong?

Just pack img_src into a tensor.
split_result = tf.string_split([img_src], '.')

Related

convert tf.dense Tensor to tf.one_hot Tensor on Graph execution Tensorflow

TF version: 2.11
I try to train a simple 2input classifier with TFRecords tf.data pipeline
I do not manage to convert the tf.dense Tensor with containing only a scalar to a tf.onehot vector
# get all recorddatasets abspath
training_names= [record_path+'/'+rec for rec in os.listdir(record_path) if rec.startswith('train')]
# load in tf dataset
train_dataset = tf.data.TFRecordDataset(training_names[1])
train_dataset = train_dataset.map(return_xy)
mapping function:
def return_xy(example_proto):
#parse example
sample= parse_function(example_proto)
#decode image 1
encoded_image1 = sample['image/encoded_1']
decoded_image1 = decode_image(encoded_image1)
#decode image 2
encoded_image2 = sample['image/encoded_2']
decoded_image2 = decode_image(encoded_image2)
#decode label
print(f'image/object/class/'+level: {sample['image/object/class/'+level]}')
class_label = tf.sparse.to_dense(sample['image/object/class/'+level])
print(f'type of class label :{type(class_label)}')
print(class_label)
# conversion to onehot with depth 26 :: -> how can i extract only the value or convert directly to tf.onehot??
label_onehot=tf.one_hot(class_label,26)
#resizing image
input_left=tf.image.resize(decoded_image1,[416, 416])
input_right=tf.image.resize(decoded_image2,[416, 416])
return {'input_3res1':input_left, 'input_5res2':input_right} , label_onehot
output:
image/object/class/'+level: SparseTensor(indices=Tensor("ParseSingleExample/ParseExample/ParseExampleV2:14", shape=(None, 1), dtype=int64), values=Tensor("ParseSingleExample/ParseExample/ParseExampleV2:31", shape=(None,), dtype=int64), dense_shape=Tensor("ParseSingleExample/ParseExample/ParseExampleV2:48", shape=(1,), dtype=int64))
type of class label :<class 'tensorflow.python.framework.ops.Tensor'>
Tensor("SparseToDense:0", shape=(None,), dtype=int64)
However I am sure that the label is in this Tensor because when run it eagerly
raw_dataset = tf.data.TFRecordDataset([rec_file])
parsed_dataset = raw_dataset.map(parse_function) # only parsing
for sample in parsed_dataset:
class_label=tf.sparse.to_dense(sample['image/object/class/label_level3'])[0]
print(f'type of class label :{type(class_label)}')
print(f'labels from labelmap :{class_label}')
I get output:
type of class label :<class 'tensorflow.python.framework.ops.EagerTensor'>
labels from labelmap :7
If I just chose a random number for the label and pass it to tf_one_hot(randint, 26) then the model begins to train (obviously nonsensical).
So the question is how can i convert the:
Tensor("SparseToDense:0", shape=(None,), dtype=int64)
to a
Tensor("one_hot:0", shape=(26,), dtype=float32)
What I tried so far
in the call data.map(parse_xy)
i tried to just call .numpy() on the tf tensors but didnt work , this only works for eager tensors.
In my understanding i cannot use eager execution because everthing in the parse_xy function gets excecuted on the whole graph:
ive already tried to enable eager execution -> failed
https://www.tensorflow.org/api_docs/python/tf/config/run_functions_eagerly
Note: This flag has no effect on functions passed into tf.data transformations as arguments.
tf.data functions are never executed eagerly and are always executed as a compiled Tensorflow Graph.
ive also tried to use the tf_pyfunc but this only returns another tf.Tensor with an unknown shape
def get_onehot(tensor):
class_label=tensor[0]
return tf.one_hot(class_label,26)
and add the line in parse_xy:
label_onehot=tf.py_function(func=get_onehot, inp=[class_label], Tout=tf.int64)
but there i always get an unknown shape which a cannot just alter with .set_shape()
I was able to solve the issue by only using TensorFlow functions.
tf.gather allows to index a TensorFlow tensor:
class_label_gather = tf.sparse.to_dense(sample['image/object/class/'+level])
class_indices = tf.gather(tf.cast(class_label_gather,dtype=tf.int32),0)
label_onehot=tf.one_hot(class_indices,26)

How to get vocabulary size in tensorflow_transform before apply_vocabulary?

Also posted the question at https://github.com/tensorflow/transform/issues/261
I am using tft in TFX and needs to transform string list class labels into multi-hot indicators inside preprocesing_fn. Essentially:
vocab = tft.vocabulary(inputs['label'])
outputs['label'] = tf.cast(
tf.sparse.to_indicator(
tft.apply_vocabulary(inputs['label'], vocab),
vocab_size=VOCAB_SIZE,
),
"int64",
)
I am trying to get VOCAB_SIZE from the result of vocab, but couldn't find a way to satisfy the deferred execution and known shapes. The closest I got below wouldn't pass the saved model export as the shape for label is unknown.
def _make_table_initializer(filename_tensor):
return tf.lookup.TextFileInitializer(
filename=filename_tensor,
key_dtype=tf.string,
key_index=tf.lookup.TextFileIndex.WHOLE_LINE,
value_dtype=tf.int64,
value_index=tf.lookup.TextFileIndex.LINE_NUMBER,
)
def _vocab_size(deferred_vocab_filename_tensor):
initializer = _make_table_initializer(deferred_vocab_filename_tensor)
table = tf.lookup.StaticHashTable(initializer, default_value=-1)
table_size = table.size()
return table_size
deferred_vocab_and_filename = tft.vocabulary(inputs['label'])
vocab_applied = tft.apply_vocabulary(inputs['label'], deferred_vocab_and_filename)
vocab_size = _vocab_size(deferred_vocab_and_filename)
outputs['label'] = tf.cast(
tf.sparse.to_indicator(vocab_applied, vocab_size=vocab_size),
"int64",
)
Got
ValueError: Feature label (Tensor("Identity_3:0", shape=(None, None), dtype=int64)) had invalid shape (None, None) for FixedLenFeature: apart from the batch dimension, all dimensions must have known size [while running 'Analyze/CreateSavedModel[tf_v2_only]/CreateSavedModel']
Any idea how to achieve this?
As per this comment in the github issue, You can use tft.experimental.get_vocabulary_size_by_name (link) to achieve the same.

tensorflow.js getting Error when checking input: expected dense_Dense1_input to have 3 dimension(s). but got array with shape

simple question and im sure answer is straightforward but im really struggling to match model shape with tensor fitting into model.
this simple code
let tf = require('#tensorflow/tfjs-node');
let features = {
x: [1,2,3,4,5,6,7,8,9],
y: [1,2,3,4,5,6,7,8,9]
}
let tensorfeature = tf.tensor2d(Object.values(features))
console.log(tensorfeature.shape)
const model = tf.sequential();
model.add(tf.layers.dense(
{
inputShape: tensorfeature.shape,
units: 1
}
))
const optimizer = tf.train.sgd(0.005);
model.compile({optimizer: optimizer, loss: 'meanAbsoluteError'});
model.fit(tensorfeature,
{epochs: 5}
)
Results in Error: Error when checking input: expected dense_Dense1_input to have 3 dimension(s). but got array with shape 2,9
tried multiple things with reshape, slice, etc with no luck. Can someone point me what exactly is wrong?
model.fit takes at least two parameters x, y which are either tensors or array of tensors. The config object is the third parameter.
Also, the feature(tensorfeature) tensor passed as argument to model.fit should be one dimension higher than the inputShape of the model. Since tensorfeature.shape is used as the inputShape, if we want to traing the model with tensorfeature its dimension should be expanded. It can be done using reshape or expandDims.
model.fit(tensorfeature.expandDims(0))
// or possibly
model.fit(tensorfeature.reshape([1, ...tensorfeature.shape])
This shape mismatch between the model and the training data has been discussed here and there

Using static rnn getting TypeError: Cannot convert value None to a TensorFlow DType

First some of my code:
...
fc_1 = layers.Dense(256, activation='relu')(drop_reshape)
bi_LSTM_2 = layers.Lambda(buildGruLayer)(fc_1)
...
def buildGruLayer(inputs):
gru_cells = []
gru_cells.append(tf.contrib.rnn.GRUCell(256))
gru_cells.append(tf.contrib.rnn.GRUCell(128))
gru_layers = tf.keras.layers.StackedRNNCells(gru_cells)
inputs = tf.unstack(inputs, axis=1)
outputs, _ = tf.contrib.rnn.static_rnn(
gru_layers,
inputs,
dtype='float32')
return outputs
Error I am getting when running static_rnn is:
raise TypeError("Cannot convert value %r to a TensorFlow DType." % type_value)
TypeError: Cannot convert value None to a TensorFlow DType.
The shape that comes into the Layer in the shape (64,238,256).
Anyone has a clue what the problem could be. I already googled the error but couldn't find anything. Any help is much appreciated.
If anyone still needs a solution to this. Its because you need to specify the dtype for the GRUCell, e.g tf.float32
Its default is None which in the documentation defaults to the first dimension of your input data (i.e batch dimension, which in tensorflow is a ? or None)
Check the dtype argument from :
https://www.tensorflow.org/api_docs/python/tf/compat/v1/nn/rnn_cell/GRUCell

Tensorflow: Read CSV

filename_queue = tf.train.string_input_producer([csv_file_path], shuffle=False)
reader = tf.TextLineReader()
_, serialized_example = reader.read(filename_queue)
filename = tf.decode_csv(serialized_example, record_defaults=[[""]], field_delim=',')
# Input
png = tf.read_file(filename)
I am reading from a CSV file with one Column.
I am getting the following error.
ValueError: **Shape** must be rank 0 but is rank 1 for 'ReadFile' (op: 'ReadFile') with input shapes: [1].
Could someone tell me the issue?
tf.read_file() needs a scalar input (i.e., just one string), but the results of tf.decode_csv are coming back in a "rank 1" context, i.e., a 1-D list. You need to dereference the results:
filename = tf.decode_csv(serialized_example, record_defaults=[[""]], field_delim=',')
filename = filename[0] # <-- add this.
png = tf.read_file(filename)
For more detail, see the docs for tf.decode_csv -- note that the return type is a list of Tensor objects.