tensorflow: write a custom_op using existing ops - tensorflow

I am trying to write my own op using the existing ops in tensorflow (Cast). Pseudocode:
- Run input through "Cast", and generate output1
- Run output1 through "Cast", and generate output2
- Return output2
The above requirement is very simple, but I cannot find any example code in tensorflow website/codebase which does anything similar. I am yet to come across any documentation for tensorflow codebase. Any pointers on this is appreciated.

Related

adding a loss in tf keras purely in terms of the outputs

I am trying to enforce a structural prior on the two outputs of my network.
The network has two heads and each predicts a different quantity: f_1(x) and f_2(x) (and each branch has its own separate loss function).
I am basically trying to add a third loss that is a function of only the outputs: L(f_1(x),f_2(x)).
I know this can easily be done if I had a custom TF loop, but any idea how to accomplish this in Keras? (I am using tf.dataset to feed the input data).
Thank you

Does it make sense to use Tensorflow Dataset over a Keras DataGenerator?

I am training a model using tf.keras and I have many small .npy files with single observations in a folder on local disk. I have build a DataGeneretor(keras.utils.Sequence) class and it works correctly, although I have a warning:
'tensorflow:multiprocessing can interact badly with TensorFlow, causing nondeterministic deadlocks. For high performance data pipelines tf.data is recommended.'
I have found out that I can simply create something like this:
ds = tf.data.Dataset.from_generator(
DataGenerator, args=[...],
output_types=(tf.float16, tf.uint8),
output_shapes=([None,256,256,3], [None,256,256,1]),
)
and then my Keras DataGenerator would work as a single file reader and a TF Dataset as interface to create batches. My question is: does it make any sense? Would it be safer? Would it read next batch during the training of previous batch, when using simple model.fit?

Outputting multiple loss components to tensorboard from tensorflow estimators

I am pretty new to tensorflow and I am struggling to get tensorboard to display some of my custom metrics. The model I am working with is a tf.estimator.Estimator, with an associated EstimatorSpec. The first new metric I am trying to log is from my loss function, which is composed of two components: a loss for an age prediction (tf.float32) and a loss for a class prediction (one-hot/multiclass), which I add together to determine a total loss (my model is predicting both a class and an age). The total loss is output just fine during training and shows up on tensorboard, but I would like to track the individual age and the class prediction loss components as well.
I think a solution that is supposed to work is to add a eval_metric_ops argument to the EstimatorSpec as described here (Custom eval_metric_ops in Estimator in Tensorflow). I have not been able to make this approach work, however. I defined a custom metric function that looks like this:
def age_loss_function(labels, ages_pred, ages_true):
per_sample_age_loss = get_age_loss_per_sample(ages_pred, ages_true) ### works fine
#### The error happens on this line:
mean_abs_age_diff, age_loss_update_fn = tf.metrics.Mean(per_sample_age_loss)
######
return mean_abs_age_diff, age_loss_update_fn
eval_metric_ops = {"age_loss": age_loss_function} #### Want to use this in EstimatorSpec
The instructions seem to say that I need both the error metric and the update function which should both be returned from the tf.metrics command as in examples like the one I linked. But this command fails for me with the error message:
tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: using a `tf.Tensor` as a Python `bool` is not allowed in Graph execution. Use Eager execution or decorate this function with #tf.function.
I am probably just misusing the APIs. If someone can guide me on the proper usage I would really appreciate it. Thanks!
It looks like the problem was from a version change. I had updated to tensorflow 2.0 while the instructions I was following were from 1.X. Using tf.compat.v1.metrics.mean() instead gets past this problem.

How to use feature_column v2 in Tensorflow (TF-Ranking)

I'm using TF-Ranking to train a recommendation engine. I have encountered a problem that seems to be a version incompatibility issue concerning tf.feature_column API.
The short version of my question is: What is a v2 feature column (TF 2.0?) (see this for instance) and how can I ensure that my feature columns are treated as v2, while I'm still using TF 1.14.
Here is the details:
I'm unable to shorten my code sufficiently to provide a reproducible example. But I will try to describe the problem in words.
TF Version: 1.14
OS: Ubuntu 18.04
I initialy had two features in my model, user and item, both sparse categorical features which were wrapped in their own tf.feature_column.embedding_column. I was able to use the train_and_evaluate method of the Estimator and export the model for serving.
Then I added a new feature curr_item which is only present during prediction (as a context feature). This shares the embeddings with item. So now I have a tf.feature_column.shared_embedding_columns which wraps both item and current_item.
Now calling train_and_evaluate results in the following error (shortened messages):
ValueError: Could not load all requested variables from checkpoint. Please make sure your model_fn does not expect variables that were not saved in the checkpoint.
Key input_layer/user_embedding/embedding_weights not found in checkpoint
Note that calling train method only works fine. My understanding is that once it gets to evaluation, it tries to load the variables from the checkpoint, but that variable doesn't exist. I did a little debugging and found the reason:
When encode_listwise_features is called during training (which in turn calls encode_features) all features (user and item) are "V2" (not sure what that means) and so the following if statement holds:
https://github.com/tensorflow/ranking/blob/31fc134816cc4974a46a11e7bb2df0066d0a88f0/tensorflow_ranking/python/feature.py#L92
and both variables are named with an encoding_layer prefix (scope name?):
encoding_layer/user_embedding/embedding_weights
encoding_layer/item_embedding/embedding_weights
But when I call the same function for all three features (a little confused wether this is in eval or predict mode), some of these are not "V2" and we end up in the else part of the above condition which calls input_layer direcetly and variables are named using input_layer prefix. Now TF is trying to restore
input_layer/user_embedding/embedding_weights
from the check-point, but that name doesn't exist in the checkpoint, because it was called
encoding_layer/user_embedding/embedding_weights
in training.
So:
1) How can I ensure that all my features are treated as v2 at all stages? I tried using tf.compat.v2.feature_column but that didn't help. There is already a ToDo note above that if statement for this.
2) Can the encode_feature be modified to avoid this situation? e.g. raise an exception with a helpful message?

Training multiple Keras models in one script

I want to train different Keras models (or in some cases just multiple runs of the same model to compare the results) in a queue (using TensorFlow as the backend if that matters). In my current setup I create and fit all of these models in one big python script, e.g. (in a simplified way):
for i in range(10):
model = create_model(i)
model.compile(...)
model.fit(...)
some_function_to_save_model(model)
The create_model(i) function creates the specific model for the i'th run. This includes changing the number of inputs / labels for example. The compile function can be different (e.g. different optimizer) for each run as well.
While this code works for me and I have not found any problems, I am unclear if this is the correct way to do it because all of the models reside in the same TensorFlow Graph (if I understand the way Keras / TensorFlow work together correctly). My questions are:
is this the correct way to run multiple independent models. (I do not want any influence of the i'th run on the i+1'th run)
is running the models from different python scripts (in this example model1.py, model2.py, ... model9.py) in any way better technically speaking (I am not referring to readability / reproducibility here) because each model would then have its own separate TensorFlow Graph / Session?
Does clearing the Session / deleting the Graph via keras.backend.clear_session() have any influence in this case if it is run after the save function (some_function_to_save_model() inside the for loop)? Is this in some way beneficial compared to the current setup?
Once again: I am not concerned with the problems that might arise due to creating messy code if all models are cramped together in one script instead of a single script per model only with creating & training models independently.
Unfortunately I did not find a concise answer to this (only suggestions using both methods). Maybe someone here can enlighten me?
Edit: Maybe I should be more precise. Basically I would like to have a technical explanation regarding the differences (advantages & disadvantages) of the following three cases:
create_and_train.py:
for i in range(10):
model = create_model(i)
model.compile(...)
model.fit(...)
some_function_to_save_model(model)
create_and_train.py:
for i in range(10):
model = create_model(i)
model.compile(...)
model.fit(...)
some_function_to_save_model(model)
# clear session:
keras.backend.clear_session()
create_and_train_i.py with i in [0, 1, ..., 9]:
i = 5 # (e.g.)
model = create_model(i)
model.compile(...)
model.fit(...)
some_function_to_save_model(model)
and e.g. a bash script that loops through these