Concatination of Ragged Tensors Raises Value Error - tensorflow2.0

I am following the view on youtube here,
it shows the code
text_1 = tf.ragged.constant(
[['who','is', 'Goerge', 'Washington'],
['What', 'is', 'the', 'weather', 'tomorrow']])
text_2 = tf.ragged.constant(['goodnight'])
text = tf.concat(text_1, text_2)
print(text)
But it raises the ValueError as follows:
ValueError: Tensor conversion requested dtype int32 for Tensor with
dtype string:
What is wrong please?

In the docs it says that concat takes a list of tensors and an axis as arguments, like so
text = tf.concat([text_1, text_2], axis=-1)
This raises a ValueError because the shapes of the tensors don't match. Please specify what you want to achieve.
Edit:
In the video you linked to there appears to be a syntax error in this line: text_2 = tf.ragged.constant(['goodnight']]). (The brackets don't match.) It should really be text_2 = tf.ragged.constant([['goodnight']]), which achieves the result printed below the operation in the video.

The tf.concat requires one list of tensors and a axis. And the text_2 should have the same dimentions of text_1
text_1 = tf.ragged.constant(
[['who', 'is', 'Goerge', 'Washington'],
['What', 'is', 'the', 'weather', 'tomorrow']])
text_2 = tf.ragged.constant([['goodnight']])
text = tf.concat([text_1, text_2], 0)
print(text)

Related

could not broadcast input array from shape (10,1) into shape (10,)

If I do the following, the code works:
df_sma = btalib.sma(df['price'].loc[symbol],period=5).df
df.loc[[symbol],'sma'] = df_sma.values
However, if I just add .iloc[-10:] to it:
df_sma = btalib.sma(df['price'].loc[symbol].iloc[-10:],period=5).df
df.loc[[symbol],'sma'].iloc[-10:] = df_sma.values
I get this error:
ValueError: could not broadcast input array from shape (10,1) into shape (10,)
What exactly changed, and why does it throw that error?

How to get vocabulary size in tensorflow_transform before apply_vocabulary?

Also posted the question at https://github.com/tensorflow/transform/issues/261
I am using tft in TFX and needs to transform string list class labels into multi-hot indicators inside preprocesing_fn. Essentially:
vocab = tft.vocabulary(inputs['label'])
outputs['label'] = tf.cast(
tf.sparse.to_indicator(
tft.apply_vocabulary(inputs['label'], vocab),
vocab_size=VOCAB_SIZE,
),
"int64",
)
I am trying to get VOCAB_SIZE from the result of vocab, but couldn't find a way to satisfy the deferred execution and known shapes. The closest I got below wouldn't pass the saved model export as the shape for label is unknown.
def _make_table_initializer(filename_tensor):
return tf.lookup.TextFileInitializer(
filename=filename_tensor,
key_dtype=tf.string,
key_index=tf.lookup.TextFileIndex.WHOLE_LINE,
value_dtype=tf.int64,
value_index=tf.lookup.TextFileIndex.LINE_NUMBER,
)
def _vocab_size(deferred_vocab_filename_tensor):
initializer = _make_table_initializer(deferred_vocab_filename_tensor)
table = tf.lookup.StaticHashTable(initializer, default_value=-1)
table_size = table.size()
return table_size
deferred_vocab_and_filename = tft.vocabulary(inputs['label'])
vocab_applied = tft.apply_vocabulary(inputs['label'], deferred_vocab_and_filename)
vocab_size = _vocab_size(deferred_vocab_and_filename)
outputs['label'] = tf.cast(
tf.sparse.to_indicator(vocab_applied, vocab_size=vocab_size),
"int64",
)
Got
ValueError: Feature label (Tensor("Identity_3:0", shape=(None, None), dtype=int64)) had invalid shape (None, None) for FixedLenFeature: apart from the batch dimension, all dimensions must have known size [while running 'Analyze/CreateSavedModel[tf_v2_only]/CreateSavedModel']
Any idea how to achieve this?
As per this comment in the github issue, You can use tft.experimental.get_vocabulary_size_by_name (link) to achieve the same.

Using static rnn getting TypeError: Cannot convert value None to a TensorFlow DType

First some of my code:
...
fc_1 = layers.Dense(256, activation='relu')(drop_reshape)
bi_LSTM_2 = layers.Lambda(buildGruLayer)(fc_1)
...
def buildGruLayer(inputs):
gru_cells = []
gru_cells.append(tf.contrib.rnn.GRUCell(256))
gru_cells.append(tf.contrib.rnn.GRUCell(128))
gru_layers = tf.keras.layers.StackedRNNCells(gru_cells)
inputs = tf.unstack(inputs, axis=1)
outputs, _ = tf.contrib.rnn.static_rnn(
gru_layers,
inputs,
dtype='float32')
return outputs
Error I am getting when running static_rnn is:
raise TypeError("Cannot convert value %r to a TensorFlow DType." % type_value)
TypeError: Cannot convert value None to a TensorFlow DType.
The shape that comes into the Layer in the shape (64,238,256).
Anyone has a clue what the problem could be. I already googled the error but couldn't find anything. Any help is much appreciated.
If anyone still needs a solution to this. Its because you need to specify the dtype for the GRUCell, e.g tf.float32
Its default is None which in the documentation defaults to the first dimension of your input data (i.e batch dimension, which in tensorflow is a ? or None)
Check the dtype argument from :
https://www.tensorflow.org/api_docs/python/tf/compat/v1/nn/rnn_cell/GRUCell

How can I use tf.string_split() in tensorflow?

I want to get the extension of image files to invoke different image decoder, and I found there's a function called tf.string_split in tensorflow r0.11.
filename_queue = tf.train.string_input_producer(filenames, shuffle=shuffle)
reader = tf.WholeFileReader()
img_src, img_bytes = reader.read(filename_queue)
split_result = tf.string_split(img_src, '.')
But when I run it, I get this error:
ValueError: Shape must be rank 1 but is rank 0 for 'StringSplit' (op: 'StringSplit') with input shapes: [], [].
I think it may caused by the shape inference of img_src. I try to use img_src.set_shape([1,]) to fix it, but it seems not work, I get this error:
ValueError: Shapes () and (1,) are not compatible
Also, I can't get the shape of img_src using
tf.Print(split_result, [tf.shape(img_src)],'img_src shape=')
The result is img_src shape=[]. But if I use the following code:
tf.Print(split_result, [img_src],'img_src=')
The result is img_src=test_img/test1.png. Am I doing something wrong?
Just pack img_src into a tensor.
split_result = tf.string_split([img_src], '.')

Sort Numpy array by subfield

I have a structured numpy array, in which one of field has subfields:
import numpy, string, random
dtype = [('name', 'a10'), ('id', 'i4'),
('size', [('length', 'f8'), ('width', 'f8')])]
a = numpy.zeros(10, dtype = dtype)
for idx in range(len(a)):
a[idx] = (''.join(random.sample(string.ascii_lowercase, 10)), idx,
numpy.random.uniform(0, 1, size=[1, 2]))
I can easily get it sorted by any of fields, like this:
a.sort(order = ['name'])
a.sort(order = ['size'])
When I try to sort it by a structured field ('size' in this example), it is effectively getting sorted by the first subfield ('length' in this example). However, I would like to have my elements sorted by 'height'. I tried something like this, but it does not work:
a.sort(order = ['size[\'height\']']))
ValueError: unknown field name: size['height']
a.sort(order = ['size', 'height'])
ValueError: unknown field name: height
Therefore, I wonder, if there is a way to accomplish the task?
I believe this is what you want:
a[a["size"]["width"].argsort()]