difficulty with imshow() numpy() and eager execution in tf2.0 - tensorflow2.0

I'm running tf2.0 in a conda environment, and would like to display a tensor in a figure.
plt.imshow(tmp)
TypeError: Image data of dtype object cannot be converted to float
tmp.dtype
tf.float32
So I tried converting it to a numpy array, but...
print(tmp.numpy())
AttributeError: 'Tensor' object has no attribute 'numpy'
tmp.eval()
ValueError: Cannot evaluate tensor using `eval()`: No default session is registered. Use `with sess.as_default()` or pass an explicit session to `eval(session=sess)`
I've read elsewhere that this is because I need an active session or eager execution. Eager execution should be enabled by default in tf2.0, but...
print(tf.__version__)
2.0.0-alpha0
tf.executing_eagerly()
False
tf.enable_eager_execution()
AttributeError: module 'tensorflow' has no attribute 'enable_eager_execution'
tf.compat.v1.enable_eager_execution()
None
tf.executing_eagerly()
False
sess = tf.Session()
AttributeError: module 'tensorflow' has no attribute 'Session'
I tried upgrading to 2.0.0b1, but the results were exactly the same (except tf.__version__).
Edit:
according to this answer, the problems are probably because I am trying to debug a function which is inside a tf.data.Dataset.map() call, which work with static graphs. So perhaps the question becomes "how do I debug these functions?"

The critical insight for me was that running the tf.data.Dataset.map() function builds a graph, and the graph is executed later as part of a data pipeline. So it is more about code generation, and eager execution doesn't apply. Besides the lack of eager execution, building a graph has other restrictions, including that all inputs and outputs must be tensors. Tensors don't support item assignment operations such as T[0] += 1.
Item assignment is a fairly common use case, so there is a straightforward solution: tf.py_function (previously tf.py_func). py_function works with numpy arrays as inputs and outputs, so you're free to make use of other numpy functions which have not yet been included in the tensorflow library.
As usual, there is a trade-off: a py_function is interpreted on the fly by the python interpreter. So it won't be as fast as pre-compiled tensor operations. More importantly, the interpreter threads are not aware of each other, so there may be parallelisation issues.
There's a helpful explanation and demonstration of a py_function in the documentation: https://www.tensorflow.org/beta/guide/data

Related

How to use feature_column v2 in Tensorflow (TF-Ranking)

I'm using TF-Ranking to train a recommendation engine. I have encountered a problem that seems to be a version incompatibility issue concerning tf.feature_column API.
The short version of my question is: What is a v2 feature column (TF 2.0?) (see this for instance) and how can I ensure that my feature columns are treated as v2, while I'm still using TF 1.14.
Here is the details:
I'm unable to shorten my code sufficiently to provide a reproducible example. But I will try to describe the problem in words.
TF Version: 1.14
OS: Ubuntu 18.04
I initialy had two features in my model, user and item, both sparse categorical features which were wrapped in their own tf.feature_column.embedding_column. I was able to use the train_and_evaluate method of the Estimator and export the model for serving.
Then I added a new feature curr_item which is only present during prediction (as a context feature). This shares the embeddings with item. So now I have a tf.feature_column.shared_embedding_columns which wraps both item and current_item.
Now calling train_and_evaluate results in the following error (shortened messages):
ValueError: Could not load all requested variables from checkpoint. Please make sure your model_fn does not expect variables that were not saved in the checkpoint.
Key input_layer/user_embedding/embedding_weights not found in checkpoint
Note that calling train method only works fine. My understanding is that once it gets to evaluation, it tries to load the variables from the checkpoint, but that variable doesn't exist. I did a little debugging and found the reason:
When encode_listwise_features is called during training (which in turn calls encode_features) all features (user and item) are "V2" (not sure what that means) and so the following if statement holds:
https://github.com/tensorflow/ranking/blob/31fc134816cc4974a46a11e7bb2df0066d0a88f0/tensorflow_ranking/python/feature.py#L92
and both variables are named with an encoding_layer prefix (scope name?):
encoding_layer/user_embedding/embedding_weights
encoding_layer/item_embedding/embedding_weights
But when I call the same function for all three features (a little confused wether this is in eval or predict mode), some of these are not "V2" and we end up in the else part of the above condition which calls input_layer direcetly and variables are named using input_layer prefix. Now TF is trying to restore
input_layer/user_embedding/embedding_weights
from the check-point, but that name doesn't exist in the checkpoint, because it was called
encoding_layer/user_embedding/embedding_weights
in training.
So:
1) How can I ensure that all my features are treated as v2 at all stages? I tried using tf.compat.v2.feature_column but that didn't help. There is already a ToDo note above that if statement for this.
2) Can the encode_feature be modified to avoid this situation? e.g. raise an exception with a helpful message?

tensor conversion function numpy() doesn't work within tf.estimator model function

I have tried this with both tensorflow v2.0 and v1.12.0 (with tf.enable_eager_execution() ). So apparently if I call numpy() with the code snippet shown below in my main() function, it works perfectly. However if I use it in my estimator model function i.e., model_fn(features, labels, mode, params) then it complains that 'Tensor' object has no attribute 'numpy'.
ndarray = np.ones([3, 3])
tensor = tf.multiply(ndarray, 42)
print(tensor)
print(tensor.numpy())
Has anyone else experienced similar problem? Seems like a big issue for tf.estimator no?
It won't work. Estimator API is tied to graph construction and it doesn't fully support eager execution. As per official documentation.
Calling methods of Estimator will work while eager execution is
enabled. However, the model_fn and input_fn is not executed eagerly
https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator
TF 2.0 won't even support custom estimators, only premade ones.

from design perspective, why aren't the error messages in eager and in graph mode the same in Tensorflow?

Why aren't the error messages in eager and in graph mode the same in Tensorflow?
For instance:
running in eager mode:
import tensorflow as tf
tf.enable_eager_execution()
a = tf.random_uniform(shape=[2, 3])
a_tile = tf.tile(a, [5])
print(a_tile)
yields the error message:
Expected multiples argument to be a vector of length 2 but got length 1 [Op:Tile]
While running in graph mode:
import tensorflow as tf
a = tf.random_uniform(shape=[2, 3])
a_tile = tf.tile(a, [5])
sess = tf.Session()
r1 = sess.run(a_tile)
print(r1)
yields the error message:
Shape must be rank 2 but is rank 1 for 'Tile' (op: 'Tile') with input shapes: [2,3], [1].
While this might seems as a nuance, in some cases the error message in one (e.g. eager) is not precisely indicating the actual error, for instance for tf.linspace (github).
Related to this, is a different behavior between eager and graph modes, not just in the error messages. For instance tf.while_loop (github).
From a design perspective, how does eager mode was incorporated into the existing TF? I feel that understanding TF design can increase productivity.
A vast majority of code paths (such as the implementation of each operation) are exactly the same between eager and graph execution, but not all.
When not using eager execution, i.e., when using graphs - there is a distinct "graph construction" and "graph execution" (via tf.Session.run) phase. For example, consider this simple snippet:
import tensorflow as tf
element = tf.gather([1, 2], [10])
# No error till this point, error will be thrown when we execute the graph
with tf.Session() as sess:
print(sess.run(element))
When eager execution is enabled, the operation executes immediately and thus the error is raised earlier:
import tensorflow as tf
tf.enable_eager_execution()
element = tf.gather([1, 2], [10])
These delayed errors when building graphs can make things hard to use, so TensorFlow attempts to catch some errors at graph construction time by running additional checks - like shape validation. For every operation registered in TensorFlow (REGISTER_OP macro in C++), there is a shape inference function that validates the shapes of the inputs and produces the shapes of the outputs as a function of the input shapes. For example, see the definition of the Tile operation (and corresponding shape function).
These shape inference functions are meant to be in sync with the requirements of the operation, which will validate the requirements again at execution time. Ideally, these checks wouldn't have to be repeated between the kernel and the shape inference function, and the error messages would be the same. However, in practice, these checks are repeated (once at graph construction time, once at graph execution time) and the error messages are not always in sync.
When eager execution is enabled, the shape inference functions (and corresponding checks) are skipped since they are redundant (the operation will make the necessary checks anyway). However, as you observed, this means that the error messages can be different - in the tf.tile example above, and I think the same holds for the tf.linspace issue on GitHub. Worth noting that in both cases, you'd get an error with the same code in both graph and eager - so the intent of debugging in eager and then executing the code as a graph should still hold - these quirks of these additional checks at graph construction time notwithstanding.
Hope that explains the design and the reason for the difference.
If the error message isn't helpful, I'd suggest filing a bug (or better yet sending a pull request) to improve it. In this particular case, it seems that the kernel implementation could provide a better error message around here.

how to create a tf.layers.Dense object

I want to create a dense layer in tensorflow. I tried tf.layers.dense(input_placeholder, units) which will directly create this layer and get result, but what I want is just a "layer module", i.e. an object of the class tf.layers.Dense(units). I want to first declare these modules/layers in a class, and then to have several member functions apply1(x, y), apply2(x,y) to use these layers.
But when I did in tensorflow tf.layers.Dense(units), it returned:
layer = tf.layers.Dense(100) AttributeError: 'module' object has no
attribute 'Dense'
But if I do tf.layers.dense(x, units), there's no problem.
Any help is appreciated, thanks.
tf.layers.Dense returns a function object that you later apply to your input. It performs variable definitions.
func = tf.layers.Dense(out_dim)
out = func(inputs)
tf.layers.dense performs both variable definitions and application of the dense layer to your input to calculate your output.
out = tf.layers.dense(inputs, out_dim)
Try to avoid the usage of placeholders, you have to feed_dict into the tf.Session so its probably causing this issue.
Try to use the new estimator api to load the data and then use dense layers as is done in the tensorflow's github examoples: [https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/layers/cnn_mnist.py]:
tf.layers.Dense was not exported in TensorFlow before version 1.4. You probably have version 1.3 or earlier installed. (You can check the version with python -c 'import tensorflow as tf; print(tf.__version__)'.)

Tensorflow contrib.learn.Estimator multi-GPU

In order to use the contrib.learn.Estimator for multi-GPU training, I am attempting to specify GPU assignments in my model_fn.
In pseudo-code:
def model_fn(X, y):
with tf.device('/gpu:1'):
... various tensorflow ops for model ...
return predictions, loss, train_op
Everything works fine without the tf.device('/gpu:1') call, but with it I encounter the following error:
InvalidArgumentError (see above for traceback): Cannot assign a device to
node 'save/ShardedFilename_1': Could not satisfy explicit device
specification '/device:GPU:1' because no supported kernel
for GPU devices is available.
I do not believe that I am adding the offending op to the graph myself, but rather that it is injected through the Estimator's snapshot functionality.
I believe that the solution is to set allow_soft_placement=True so that non GPU functions will fall to CPU, but it's not obvious to me how that exposed when dealing with contrib.learn.Estimator.
I see that the option is usually set in ConfigProto & passed to the session, but I've been using the Estimator's functionality to manage the session for me. Should I be taking control of the session creation, or am I missing a parameter somewhere to accomplish this?
Many thanks in advance for any advice.
Along with Estimator leaving contrib in Tensorflow 1.0 this is fixed.