ONNX serializing torch models that take strings as inputs

ONNX serializing torch models that take strings as inputs - serialization

Are there any examples of ONNX serializing torch models that take strings as inputs?
Putting a random type as args in the torch.onnx.export function yields:
RuntimeError: Only tuples, lists and Variables supported as JIT inputs/outputs. Dictionaries and strings are also accepted but their usage is not recommended. But got unsupported type numpy.ndarray
but there's also the explanation under the args definition that:
... Any non-Tensor arguments will
be hard-coded into the exported model; any Tensor arguments
will become inputs of the exported model, in the order they
occur in args. If args is a Tensor, this is equivalent
so is there a way to serialize a model like this:
model('hello world')
Out: [0.8, 0.2]

Related

Tensorflow Saving Error (from Tensorflow Example)

I am trying to use the Basic Text Classification example from Tensorflow on my own dataset. Training and verification have gone well and I am to the point in the tutorial for exporting the model. The model compiles and works on an array of strings.
After that, I'd like to save the model in h5 format for use in other projects. At this point, the tutorial refers you to save and load keras models tutorial.
This second tutorial essentially says to do this:
model.save('path/saved_model.h5')
This fails with
ValueError: Weights for model sequential_X have not yet been created. Weights are created when the Model is first called on inputs or build() is called with an input_shape.
So next I attempt to do this:
model.build((None, max_features))
model.save('path/saved_model.h5')
There are several errors with this:
ValueError: Tensor conversion requested dtype string for Tensor with dtype float32: <tf.Tensor 'Placeholder:0' shape=(None, 45000) dtype=float32>
TypeError: Input 'input' of 'StringLower' Op has type float32 that does not match expected type of string.
ValueError: You cannot build your model by calling build if your layers do not support float type inputs. Instead, in order to instantiate and build your model, call your model on real tensor data (of the correct dtype).
I think this essentially means the input I defined to pass into model.build defaults to float and needs to be string. I think I have two options:
Somehow define my input layer to be string, which I cannot see how to do. This feels like the correct thing to do.
Use model.call. However I am not sure how to 'call my model on real tensor data' because tensors can't be strings and that is the input to the network.
I've seen one other person with this issue here, with no solution other than to rebuild the model in functional style with mixed results. I am not sure of the point of rebuilding in the functional style since I don't fully understand the problem.
I'd prefer to have the TextVectorization layer built into the final model to simplify deployment. This is exactly the reason the docs give for doing this in the example in the first place. (The model will save without it.)
I am a novice with this so I might be making a simple mistake. How can I get this model to save?

Create a TF Dataset of SparseTensors with from_generator

I have a generator that yields tf.sparse.SparseTensors. I want to turn this into a Tensorflow Dataset, but am running into some issues. I am using TF2. First, unlike regular Tensors, you cannot simply pass them in (and providing the correct data types for output_types). For a sparse tensor of [1,0,0,0,5,0], the error looks like
tensorflow.python.framework.errors_impl.InvalidArgumentError: TypeError: `generator` yielded an element that could not be converted to the expected type. The expected type was int64, but the yielded element was SparseTensor(indices=tf.Tensor(
E [[0]
E [4]], shape=(2, 1), dtype=int64), values=tf.Tensor([1 5], shape=(2,), dtype=int64), dense_shape=tf.Tensor([6], shape=(1,), dtype=int64)).
After doing some looking around on the internet, I found this open issue and tried to do something similar https://github.com/tensorflow/tensorflow/issues/16689 - read the indices, values, and shape as separate tensors into a TF Dataset, and then mapping over the dataset to create the sparse tensor. This is not working as shown in some of the examples in the github issue - tf.sparse.SparseTensor(indices, values, shape) does not seem to accept indices and shape in the form of a tf.Tensor - it will happily take in a list or numpy array, but not a Tensor. Since map is not eager, I also cannot call .numpy() on the Tensor either. What is best way to get this to work? I see there is tf.py_function/tf.numpy_function which could help, but constructing the output type can be tricky (though not impossible) for my use case - the incoming data is not fixed and can have a mix of sparse and dense tensors.

difficulty with imshow() numpy() and eager execution in tf2.0

I'm running tf2.0 in a conda environment, and would like to display a tensor in a figure.
plt.imshow(tmp)
TypeError: Image data of dtype object cannot be converted to float
tmp.dtype
tf.float32
So I tried converting it to a numpy array, but...
print(tmp.numpy())
AttributeError: 'Tensor' object has no attribute 'numpy'
tmp.eval()
ValueError: Cannot evaluate tensor using `eval()`: No default session is registered. Use `with sess.as_default()` or pass an explicit session to `eval(session=sess)`
I've read elsewhere that this is because I need an active session or eager execution. Eager execution should be enabled by default in tf2.0, but...
print(tf.__version__)
2.0.0-alpha0
tf.executing_eagerly()
False
tf.enable_eager_execution()
AttributeError: module 'tensorflow' has no attribute 'enable_eager_execution'
tf.compat.v1.enable_eager_execution()
None
tf.executing_eagerly()
False
sess = tf.Session()
AttributeError: module 'tensorflow' has no attribute 'Session'
I tried upgrading to 2.0.0b1, but the results were exactly the same (except tf.__version__).
Edit:
according to this answer, the problems are probably because I am trying to debug a function which is inside a tf.data.Dataset.map() call, which work with static graphs. So perhaps the question becomes "how do I debug these functions?"

The critical insight for me was that running the tf.data.Dataset.map() function builds a graph, and the graph is executed later as part of a data pipeline. So it is more about code generation, and eager execution doesn't apply. Besides the lack of eager execution, building a graph has other restrictions, including that all inputs and outputs must be tensors. Tensors don't support item assignment operations such as T[0] += 1.
Item assignment is a fairly common use case, so there is a straightforward solution: tf.py_function (previously tf.py_func). py_function works with numpy arrays as inputs and outputs, so you're free to make use of other numpy functions which have not yet been included in the tensorflow library.
As usual, there is a trade-off: a py_function is interpreted on the fly by the python interpreter. So it won't be as fast as pre-compiled tensor operations. More importantly, the interpreter threads are not aware of each other, so there may be parallelisation issues.
There's a helpful explanation and demonstration of a py_function in the documentation: https://www.tensorflow.org/beta/guide/data

TensorFlow Dataset Generator With Mixed Datatypes

I'm using the TensorFlow Datasets API (https://www.tensorflow.org/guide/datasets) and in particular, i'm using it with the TensorFlow Estimators API (https://www.tensorflow.org/guide/datasets_for_estimators) which recommends using a generator function.
I'm having trouble writing a generator function which yields features with different output types (e.g., a mix of int, float, and string.) I've figured out how to specify feature+label types different from the generator...but only when all the label types are identical.
However...suppose you have a variety of feature types to emit (in the case of the typical imports85 TensorFlow demonstration, for example, you would emit car make and model as strings (which later get categorized downstream) as well as Highway-MPG as float32 and number-of-doors as int. How does one specify on the Dataset from_generator call the various feature types?
dataset = tf.data.Dataset.
from_generator(generator=self._generator,
output_types=(tf.float32, tf.int32),
output_shapes=(tf.TensorShape([None]),tf.TensorShape([1])))
I've already tried the obvious approach of using
output_types=((tf.float32, tf.float32, tf.string, tf.string), tf.int32)
without luck. Any help would be appreciated.

From the official documentation:
It is not possible to have a tf.Tensor with more than one data type. It is possible, however, to serialize arbitrary data structures as strings and store those in tf.Tensors.
So you might need to store them as strings and then evaluate them using functions like decode_raw for example.

In Tensorflow, what is the difference between a tensor that has a type ending in _ref and a tensor that does not?

The docs say:
In addition, variants of these types with the _ref suffix are defined
for reference-typed tensors.
What exactly does this mean? What are reference-typed tensors and how do they differ from standard ones?

A reference-typed tensor is mutable. The most common way to create a reference-typed tensor is to define a tf.Variable: defining a tf.Variable whose initial value has dtype tf.float32 will create a reference-typed tensor with dtype tf.float32_ref. You can mutate a reference-typed tensor by passing it as the first argument to tf.assign().
(Note that reference-typed tensors are something of an implementation detail in the present version of TensorFlow. We'd encourage you to use higher-level wrappers like tf.Variable, which may migrate to alternative representations for mutable state in the future.)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas