ValueError from tensorflow estimator RNNClassifier with gcloud ml-engine job - tensorflow

I am working on the task.py file for submitting a gcloud MLEngine job. Previously I was using tensorflow.estimator.DNNClassifier successfully to submit jobs with my data (which consists solely of 8 columns of sequential numerical data for cryptocurrency prices & volume; no categorical).
I have now switched to the tensorflow contrib estimator RNNClassifier. This is my current code for the relevant portion:
def get_feature_columns():
return [
tf.feature_column.numeric_column(feature, shape=(1,))
for feature in column_names[:len(column_names)-1]
]
def build_estimator(config, learning_rate, num_units):
return tf.contrib.estimator.RNNClassifier(
sequence_feature_columns=get_feature_columns(),
num_units=num_units,
cell_type='lstm',
rnn_cell_fn=None,
optimizer=tf.train.AdamOptimizer(learning_rate=learning_rate),
config=config)
estimator = build_estimator(
config=run_config,
learning_rate=args.learning_rate,
num_units=[32, 16])
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
However, I'm getting the following ValueError:
ValueError: All feature_columns must be of type _SequenceDenseColumn. You can wrap a sequence_categorical_column with an embedding_column or indicator_column. Given (type <class 'tensorflow.python.feature_column.feature_column_v2.NumericColumn'>): NumericColumn(key='LTCUSD_close', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)
I don't understand this, as the data is not categorical.

As #Ben7 pointed out sequence_feature_columns accepts columns like sequence_numeric_column. However, according to the documentation, RNNClassifier sequence_feature_columns expects SparseTensors and sequence_numeric_column is a dense tensor. This seems to be contradictory.
Here is a workaround I used to solve this issue (I took the to_sparse_tensor function from this answer):
def to_sparse_tensor(dense):
# sequence_numeric_column default is float32
zero = tf.constant(0.0, dtype=tf.dtypes.float32)
where = tf.not_equal(dense, zero)
indices = tf.where(where)
values = tf.gather_nd(dense, indices)
return tf.SparseTensor(indices, values, tf.shape(dense, out_type=tf.dtypes.int64))
def get_feature_columns():
return [
tf.feature_column.sequence_numeric_column(feature, shape=(1,), normalizer_fn=to_sparse_tensor)
for feature in column_names[:len(column_names)-1]
]

you got this error because you use a numeric feature column whereas this kind of estimator can only accept sequence feature columns as you can see it on the init function.
So, instead of using numeric column you have to use sequence_numeric_column.

Related

K-Means of Tensorflow - Graph disconnected error

I am trying to write a function that runs KMeans on a dataset and outputs the cluster centroids. My aim is to use this in a custom keras layer, so I am using TensorFlow's implementation of KMeans that takes a tensor as the input dataset.
My problem however is that I can't make it work even as a standalone function. The problem comes from the fact that KMeans accepts a generator function that provides mini-batches instead of a plain tensor, but when I am using closure to do that, I get a graph disconnected error:
import tensorflow as tf # version: 2.4.1
from tensorflow.compat.v1.estimator.experimental import KMeans
#tf.function
def KMeansCentroids(inputs, num_clusters, steps, use_mini_batch=False):
# `inputs` is a 2D tensor
def input_fn():
# Each one of the lines below results in the same "Graph Disconnected" error. Tuples don't really needed but just to be consistent with the documentation
return (inputs, None)
return (tf.data.Dataset.from_tensor_slices(inputs), None)
return (tf.convert_to_tensor(inputs), None)
kmeans = KMeans(
num_clusters=num_clusters,
use_mini_batch=use_mini_batch)
kmeans.train(input_fn, steps=steps) # This is where the error happens
return kmeans.cluster_centers()
>>> x = tf.random.uniform((100, 2))
>>> c = KMeansCentroids(x, 5, 10)
The exact error is:
ValueError:
Tensor("strided_slice:0", shape=(), dtype=int32)
must be from the same graph as
Tensor("Equal:0", shape=(), dtype=bool)
(graphs are FuncGraph(name=KMeansCentroids, id=..) and <tensorflow.python.framework.ops.Graph object at ...>).
If I were to use a numpy dataset and convert to tensor inside the function, the code would work just fine.
Also, making input_fn() return directly tf.random.uniform((100, 2)) (ignoring the inputs argument), would again work. That's why I am guessing that tensorflow doesn't support closures since it needs to build the computation graph at the beginning.
But I don't see how to work around that.
Could it be a version error due to KMeans being a compat.v1.experimental module?
Note that the documentation of KMeans states for the input_fn():
The function should construct and return one of the following:
A tf.data.Dataset object: Outputs of Dataset object must be a tuple (features, labels) with same constraints as below.
A tuple (features, labels): Where features is a tf.Tensor or a dictionary of string feature name to Tensor and labels is a Tensor or a dictionary of string label name to Tensor. Both features and labels are consumed by model_fn. They should satisfy the expectation of model_fn from inputs.
The problem you're facing is more about invoking tensor outside the created graph. Basically, when you called the .train function, a new graph will be created and that is with the graph defined in that input_fn and the graph defined in the model_fn.
kmeans.train(input_fn, steps=steps)
And, after that all the tensors those coming outside these functions will be treated as outsiders and won't part of this new graph. That's why you're getting a graph disconnected error for trying to use outsider tensor. To resolve this, you need to create the necessary tensors within these graphs.
import tensorflow as tf
from tensorflow.compat.v1.estimator.experimental import KMeans
#tf.function
def KMeansCentroids(num_clusters, steps, use_mini_batch=False):
def input_fn(batch_size):
pinputs = tf.random.uniform((100, 2))
dataset = tf.data.Dataset.from_tensor_slices((pinputs))
dataset = dataset.shuffle(1000).repeat()
return dataset.batch(batch_size)
kmeans = KMeans(
num_clusters=num_clusters,
use_mini_batch=use_mini_batch)
kmeans.train(input_fn = lambda: input_fn(5),
steps=steps)
return kmeans.cluster_centers()
c = KMeansCentroids(5, 10)
Here is some more info for reading, 1. FYI, I tested your code with a few versions of tf > 2, and I don't think it's related to version error or something.
Re-mentioning here for future readers. An alternative of using KMeans within Keras layers:
tf_kmeans.py
ClusteringLayer

How do I create multiple custom AUC metrics, one for each of the outputs, in TensorFlow?

In TensorFlow 2.0, there's the class tf.keras.metrics.AUC. It can easily be added to the list of metrics of the compile method as follows.
# Example taken from the documentation
model.compile('sgd', loss='mse', metrics=[tf.keras.metrics.AUC()])
However, in my case, the output of my neural network is an NxM tensor, where N is the batch size and M is the number of separate outputs. I would like to compute the AUC metric for each of these M outputs separately (across all N instances of the batch). So, there should be M AUC metrics, each of them is computed with N observations. I tried to create a custom metric, but I am facing some issues. The following is my first attempt.
def get_custom_auc(output):
auc = tf.metrics.AUC()
#tf.function
def custom_auc(y_true, y_pred):
y_true = y_true[:, output]
y_pred = y_pred[:, output]
auc.update_state(y_true, y_pred)
return auc.result()
custom_auc.__name__ = "custom_auc_" + str(output)
return custom_auc
The need to rename custom_auc.__name__ is described in the following post: Is it possible to have a metric that returns an array (or tensor) rather than a number?. However, this implementation raises an error.
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [predictions must be >= 0] [Condition x >= y did not hold element-wise:] [x (strided_slice_1:0) = ] [3.14020467 3.06779885 2.86414027...] [y (Cast_1/x:0) = ] [0]
[[{{node metrics/custom_auc_2/StatefulPartitionedCall/assert_greater_equal/Assert/AssertGuard/else/_161/Assert}}]] [Op:__inference_keras_scratch_graph_5149]
I have also tried to create the AUC object inside the custom_auc, but this is not possible because I am using #tf.function, so I will get the error ValueError: tf.function-decorated function tried to create variables on non-first call.. Even if I remove the #tf.function (which I may need because I may use some if-else statements inside the implementation), I get another error
tensorflow.python.framework.errors_impl.FailedPreconditionError: Error while reading resource variable _AnonymousVar33 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/_AnonymousVar33/N10tensorflow3VarE does not exist.
[[node metrics/custom_auc_0/add/ReadVariableOp (defined at /train.py:173) ]] [Op:__inference_keras_scratch_graph_5174]
Note that, currently, I am adding these AUC metrics, one for each of the M outputs, as described in this answer. Furthermore, I cannot simply return the object auc, because apparently Keras expects the output of the custom metric to be a tensor and not an AUC object. So, if you do that, you get the following error.
TypeError: To be compatible with tf.contrib.eager.defun, Python functions must return zero or more Tensors; in compilation of .custom_auc at 0x1862e6680>, found return value of type , which is not a Tensor.
I've also tried to implement a custom metric class as follows.
class CustomAUC(tf.metrics.Metric):
def __init__(self, num_outputs, name="custom_auc", **kwargs):
super(CustomAUC, self).__init__(name=name, **kwargs)
assert num_outputs >= 1
self.num_outputs = num_outputs
self.aucs = [tf.metrics.AUC() for _ in range(self.num_outputs)]
def update_state(self, y_true, y_pred, sample_weight=None):
for output in range(self.num_outputs):
y_true1 = y_true[:, output]
y_pred1 = y_pred[:, output]
self.aucs[output].update_state(y_true1, y_pred1)
def result(self):
return [auc.result() for auc in self.aucs]
However, I am currently getting the error
ValueError: Shapes (200,) and () are incompatible
This error seems to be related to reset_states, so maybe I should also override this method. In fact, if I override reset_states with the following implementation
def reset_states(self):
for auc in self.aucs:
auc.reset_states()
I don't get this error anymore, but I get another error
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [predictions must be >= 0] [Condition x >= y did not hold element-wise:] [x (strided_slice_1:0) = ] [-1.38822043 1.24234951 -0.254447281...] [y (Cast_1/x:0) = ] [0]
[[{{node metrics/custom_auc/PartitionedFunctionCall/assert_greater_equal/Assert/AssertGuard/else/_98/Assert}}]] [Op:__inference_keras_scratch_graph_5248]
So, how do I implement this custom AUC metric, one for each of the M outputs of the network? Basically, I want to do something similar to the solution described in this answer, but with the AUC metric.
I have also opened the related issue on the TensorFlow's Github issue tracker.
I have a similar problem like yours. I have a model with 3 outputs and and i want to compute a custom metric (ConfusionMatricMetric) for the 3 outputs (that have different number of classes each). I used a solution in here https://keras.io/guides/customizing_what_happens_in_fit/ - Going lower level. My problem now is that I can't train the model because of
ValueError: tf.function-decorated function tried to create variables on non-first call.
then I used
tf.config.run_functions_eagerly(True)
and now the models train, very slow but it can be saved
P.S. I also used tf.keras.metrics.KLDivergence() instead of my custom metric and reproduced the same experiment with the same results as above - trained & saved (tf.saved_model.save)

Reshape tensor using placeholder value

I want to reshape a tensor using the [int, -1] notation (to flatten an image, for example). But I don't know the first dimension ahead of time. One use case is train on a large batch, then evaluate on a smaller batch.
Why does this give the following error: got list containing Tensors of type '_Message'?
import tensorflow as tf
import numpy as np
x = tf.placeholder(tf.float32, shape=[None, 28, 28])
batch_size = tf.placeholder(tf.int32)
def reshape(_batch_size):
return tf.reshape(x, [_batch_size, -1])
reshaped = reshape(batch_size)
with tf.Session() as sess:
sess.run([reshaped], feed_dict={x: np.random.rand(100, 28, 28), batch_size: 100})
# Evaluate
sess.run([reshaped], feed_dict={x: np.random.rand(8, 28, 28), batch_size: 8})
Note: when I have the reshape outside of the function it seems to work, but I have very large models that I use multiple times, so I need to keep them in a function and pass the dim using an argument.
To make this work, replace the function:
def reshape(_batch_size):
return tf.reshape(x, [_batch_size, -1])
…with the function:
def reshape(_batch_size):
return tf.reshape(x, tf.pack([_batch_size, -1]))
The reason for the error is that tf.reshape() expects a value that is convertible to a tf.Tensor as its second argument. TensorFlow will automatically convert a list of Python numbers to a tf.Tensor but will not automatically convert a mixed list of numbers and tensors (such as a tf.placeholder())—instead raising the somewhat unintuitive error message you saw.
The tf.pack() op takes a list of objects convertible to a tensor, and converts each element individually, so it can handle the combination of a placeholder and an integer.
hi all the issue is due to Keras version. I tried above all without any success. Uninstall Keras and install via pip. It worked for me.
I was facing this error with Keras 1.0.2 & resolved with Keras 1.2.0
Hope this will help. Thank you

In Tensorflow, how do I generate a scalar summary?

Does anyone have a minimal example of using a SummaryWriter with a scalar_summary in order to see (say) a cross entropy result during a training run?
The example given in the documentation:
merged_summary_op = tf.merge_all_summaries()
summary_writer = tf.train.SummaryWriter('/tmp/mnist_logs', sess.graph_def)
total_step = 0
while training:
total_step += 1
session.run(training_op)
if total_step % 100 == 0:
summary_str = session.run(merged_summary_op)
summary_writer.add_summary(summary_str, total_step)
Returns an error: TypeError: Fetch argument None of None has invalid type , must be a string or Tensor. (Can not convert a NoneType into a Tensor or Operation.)
When I run it.
If I add a:
tf.scalar_summary('cross entropy', cross_entropy)
operation after my cross entropy calculation, then instead I get the error:
InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder_2' with dtype float
Which suggests that I need to add a feed_dict to the
summary_str = session.run(merged_summary_op)
call, but I am not clear what that feed_dict should contain....
The feed_dict should contain the same values that you use for running the training_op. It basically specifies the input values to your network for which you want to calculate the summaries.
The error is probably coming from:
session.run(training_op)
Did you paste the example code into a version of the mnist code that requires a feed_dict for feeding in training examples? Check the backtrace it gave you (and include it above if that doesn't solve the problem).

tensorflow MNIST fully_connected_feed.py fails: range() takes at least 2 arguments (1 given)

I'm having trouble running the example in one of the tensor flow tutorials. The tutorial says to run I just need to type python fully_connected_feed.py. When I do this it gets through fetching the input data, but then fails, like so:
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
Traceback (most recent call last):
File "fully_connected_feed.py", line 225, in <module>
tf.app.run()
File "/Users/me/anaconda/lib/python2.7/site-packages/tensorflow/python/platform/default/_app.py", line 11, in run
sys.exit(main(sys.argv))
File "fully_connected_feed.py", line 221, in main
run_training()
File "fully_connected_feed.py", line 141, in run_training
loss = mnist.loss(logits, labels_placeholder)
File "/Users/me/tftmp/mnist.py", line 96, in loss
indices = tf.expand_dims(tf.range(batch_size), 1)
TypeError: range() takes at least 2 arguments (1 given)
I think this error is caused because there is some problem with session setup and/or tensor evaluation. This is the function in mnist.py causing the problem:
def loss(logits, labels):
"""Calculates the loss from the logits and the labels.
Args:
logits: Logits tensor, float - [batch_size, NUM_CLASSES].
labels: Labels tensor, int32 - [batch_size].
Returns:
loss: Loss tensor of type float.
"""
# Convert from sparse integer labels in the range [0, NUM_CLASSSES)
# to 1-hot dense float vectors (that is we will have batch_size vectors,
# each with NUM_CLASSES values, all of which are 0.0 except there will
# be a 1.0 in the entry corresponding to the label).
batch_size = tf.size(labels)
labels = tf.expand_dims(labels, 1)
indices = tf.expand_dims(tf.range(batch_size), 1)
concated = tf.concat(1, [indices, labels])
onehot_labels = tf.sparse_to_dense(
concated, tf.pack([batch_size, NUM_CLASSES]), 1.0, 0.0)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits, onehot_labels,
name='xentropy')
loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')
return loss
If I put all the code in the loss function inside a with tf.Session(): block, it gets past this error. However, I get other errors later about uninitialised variables, so I'm guessing something major is going wrong with session setup or initialisation, or something. Being new to tensor flow I'm a little at a loss. Any ideas?
[NB: I havent edited the code at all, just downloaded from the tensorflow tutorials and tried to run as instructed, with python fully_connected_feed.py]
This issue arises because in the latest version of the TensorFlow source on GitHub, tf.range() has been updated to be more permissive with its arguments (previously it required two arguments; now it has the same semantics as Python's range() built-in function), and the fully_connected_feed.py example has been updated to exploit this.
However, if you try to run this version against the binary distribution of TensorFlow, you will get this error because the change to tf.range() has not been incorporated into the binary package.
The easiest solution is to download the old version of mnist.py. Alternatively, you could build from source to use the latest version of the tutorial.
you can right result fix mnist code like this :
indices = tf.expand_dims(tf.range(0,batch_size),1)
TypeError: range() takes at least 2 arguments (1 given)
That's the error.
Looking at the tensorflow docs for range, we can see that range has a function signature of start, limit, delta=1, name='range'. This means that at least two arguments are required for function invocation. Your example only shows one argument provided.
An example can be found in the docs:
# 'start' is 3
# 'limit' is 18
# 'delta' is 3
tf.range(start, limit, delta) ==> [3, 6, 9, 12, 15]