Different Evaluation of Seemingly Same Tensors - tensorflow

In my program I have:
run_plain = neural_network_model(x)
run_max = tf.argmax(run_plain, 1)
and
run_network = tf.argmax(neural_network_model(x), 1)
run_max and run_network give me different outputs when executed with the same input, e.g. via run_max.eval({x:[test_x[i]]}).
Is there some fundamental flaw about how Tensorflow eval() works that I am misunderstanding - because in my opinion the results should be the same or is there some other error in my code?

Could you post your whole example?
Otherwise based on what has been given there should be no difference between the two examples.

Related

Do the number of units in a layer need to be defined within a conditional scope when using keras tuner to setup a model?

According to the Keras Tuner examples here and here, if you want to define the number of layers and each layer's units in a deep learning model using hyper parameters you do something like this:
for i in range(hp.Int('num_layers', 1, 10)):
model.add(layers.Dense(units=hp.Int('unit_' + str(i), 32, 512, 32)))
However, as others have noted here and here after the oracle has seen a model with num_layers = 10 it will always assign a value to unit_0 through unit_9, even when num_layers is less than 10.
In the case that num_layers = 1 for example, only unit_0 will be used to build the model. But, unit_1 through unit_9 will be defined and active in the hyper parameters.
Does the oracle "know" that unit_1 through unit_9 weren't actually used to build the model (and therefore disregard their relevance for impacting the results of that trial)?
Or, does it assume unit_1 through unit_9 are being used because they have been defined (and calling hp.get('unit_9') for example will return a value)?
In the latter case the oracle is using misinformation to drive the tuning process. As a result it will take longer to converge (at best) and incorrectly converge to a solution as a result of assigning relevance to the unused hyper parameters (at worst).
Should the model actually be defined using conditional scopes, like this?
num_layers = hp.Int('num_layers', 1, 10)
for i in range(num_layers):
with hp.conditional_scope('num_layers', list(range(i + 1, 10 + 1))):
model.add(layers.Dense(units=hp.Int('unit_' + str(i), 32, 512, 32)))
When defining the model like this, if num_layers < 10, calling hp.get('unit_9') will return a ValueError: Conditional parameter unit_10 is not currently active, as expected.
Using conditional scope is the best as it correctly recognizes active parameters. Without using conditional scope it is, at least at the moment, not possible to let the tuner know what parameters are actually used.
However, when using RandomSearch the simpler way (that allows inactive parameters to be there) the result should be exactly the same. When starting a new trial the tuner will go through all possibilities, but will reject the invalid ones before actually starting the trial.
For the existing tuners I think only Bayesian is strongly affected by this. I am not 100% sure about the case of Hyperband; but for RandomSearch the two approaches are exactly the same (except for displaying inactive parameters that make people confused).

How to use tensorflow's FFT?

I am having some trouble reconciling my FFT results from MATLAB and TF. The results are actually very different. Here is what I have done:
1). I would attach my data file here but didn't find a way to do so. Anyways, my data is stored in a .mat file, and the variable we will work with is called 'TD'. In MATLAB, I first subtract the mean of the data, and then perform fft:
f_hat = TD-mean(TD);
x = fft(f_hat);
2). In TF, I use
tf.math.reduce_mean
to calculate the mean, and it only differs from MATLAB's mean on the order of 10^-8. So in TF I have:
mean_TD = tf.reduce_mean(TD)
f_hat_int = TD - mean_TD
f_hat_tf = tf.dtypes.cast(f_hat_int,tf.complex64)
x_tf = tf.signal.fft(f_hat_tf)
So up until 'f_hat' and 'f_hat_tf', the difference is very slight and is caused only by the difference in the mean. However, x and x_tf are very different. I am wondering did I not use TF's FFT correctly?
Thanks!
Picture showing the difference

In Tensorflow-Serving, is it possible to get only the top-k prediction results?

When using the code in https://www.tensorflow.org/serving, but with a DNNClassifier Estimator model, the curl/query request returns all the possible label classes and their associated scores.
Using a model with 100,000+ possible output/label classes, the response becomes too large. Is there any way to limit the number of outputs to the top-k results? (Similar to how it can be done in keras).
The only possibility I could think of is feeding some parameter into the predict API through the signatures, but I haven't found any parameters that would give this functionality. I've read through a ton of documentation + code and googled a ton, but to no avail.
Any help would be greatly appreciated. Thanks in advance for any responses. <3
AFAIC, there are 2 ways to support your need.
You could add some lines in tensorflow-serving source code referring to this
You could do something like this while training/retraining your model.
Hope this will help.
Putting this up here in case it helps anyone. It's possible to override the classification_output() function in head.py (which is used by dnn.py) in order to filter the top-k results. You can insert this snippet into your main.py / train.py file, and whenever you save an DNNClassifier model, that model will always output at most num_top_k_results when doing inference/serving. The vast majority of the method is copied from the original classification_output() function. (Note this may or may not work with 1.13 / 2.0 as it hasn't been tested on those.)
from tensorflow.python.estimator.canned import head as head_lib
num_top_k_results = 5
def override_classification_output(scores, n_classes, label_vocabulary=None):
batch_size = array_ops.shape(scores)[0]
if label_vocabulary:
export_class_list = label_vocabulary
else:
export_class_list = string_ops.as_string(math_ops.range(n_classes))
# Get the top_k results
top_k_scores, top_k_indices = tf.nn.top_k(scores, num_top_k_results)
# Using the top_k_indices, get the associated class names (from the vocabulary)
top_k_classes = tf.gather(tf.convert_to_tensor(value=export_class_list), tf.squeeze(top_k_indices))
export_output_classes = array_ops.tile(
input=array_ops.expand_dims(input=top_k_classes, axis=0),
multiples=[batch_size, 1])
return export_output.ClassificationOutput(
scores=top_k_scores,
# `ClassificationOutput` requires string classes.
classes=export_output_classes)
# Override the original method with our custom one.
head_lib._classification_output = override_classification_output

Reinforcement learning a3c with multiple independent outputs

I am attempting to modify and implement googles pattern of the Asynchronous Advantage Actor Critic (A3C) model. There are plenty of examples online out there that have gotten me started but I am running into a issues attempting to expand the samples.
All of the examples I can find focus on pong as the example which has a state based output of left or right or stay still. What I am trying to expand this to is a system that also has a separate on off output. In the context of pong, it would be a boost to your speed.
The code I am basing my code on can be found here. It is playing doom, but it still has the same left and right but also a fire button instead of stay still. I am looking at how I could modify this code such that fire was an independent action from movement.
I know I can easily add another separate output from the model so that the outputs would look something like this:
self.output = slim.fully_connected(rnn_out,a_size,
activation_fn=tf.nn.softmax,
weights_initializer=normalized_columns_initializer(0.01),
biases_initializer=None)
self.output2 = slim.fully_connected(rnn_out,1,
activation_fn=tf.nn.sigmoid,
weights_initializer=normalized_columns_initializer(0.01),
biases_initializer=None)
The thing I am struggling with is how then do I have to modify the value output and redefine the loss function. The value is still tied to the combination of the two outputs. Or is there a separate value output for each of the independent output. I feel like it should still only be one output as the value, but I am unsure how I them use that one value and modify the loss function to take this into account.
I was thinking of adding a separate term to the loss function so that the calculation would look something like this:
self.actions_1 = tf.placeholder(shape=[None],dtype=tf.int32)
self.actions_2 = tf.placeholder(shape=[None],dtype=tf.float32)
self.actions_onehot = tf.one_hot(self.actions_1,a_size,dtype=tf.float32)
self.target_v = tf.placeholder(shape=[None],dtype=tf.float32)
self.advantages = tf.placeholder(shape=[None],dtype=tf.float32)
self.responsible_outputs = tf.reduce_sum(self.output1 * self.actions_onehot, [1])
self.responsible_outputs_2 = tf.reduce_sum(self.output2 * self.actions_2, [1])
#Loss functions
self.value_loss = 0.5 * tf.reduce_sum(tf.square(self.target_v - tf.reshape(self.value,[-1])))
self.entropy = - tf.reduce_sum(self.policy * tf.log(self.policy))
self.policy_loss = -tf.reduce_sum(tf.log(self.responsible_outputs)*self.advantages) -
tf.reduce_sum(tf.log(self.responsible_outputs_2)*self.advantages)
self.loss = 0.5 * self.value_loss + self.policy_loss - self.entropy * 0.01
I am looking to know if I am on the right track here, or if there are resources or examples that I can expand off of.
First of all, the example you are mentioning don't need two output nodes. One output node with continuous output value is enough to solve. Also you should't use placeholder for advantage, but rather you should use for discounted reward.
self.discounted_reward = tf.placeholder(shape=[None],dtype=tf.float32)
self.advantages = self.discounted_reward - self.value
Also while calculating the policy loss you have to use tf.stop_gradient to prevent the value node gradient feedback contribution for policy learning.
self.policy_loss = -tf.reduce_sum(tf.log(self.responsible_outputs)*tf.stop_gradient(self.advantages))

How is tf.summary.tensor_summary meant to be used?

TensorFlow provides a tf.summary.tensor_summary() function that appears to be a multidimensional variant of tf.summary.scalar():
tf.summary.tensor_summary(name, tensor, summary_description=None, collections=None)
I thought it could be useful for summarizing inferred probabilities per class ... somewhat like
op_summary = tf.summary.tensor_summary('classes', some_tensor)
# ...
summary = sess.run(op_summary)
writer.add_summary(summary)
However it appears that TensorBoard doesn't provide a way to display these summaries at all. How are they meant to be used?
I cannot get it to work either. It seems like that feature is still under development. See this video from the TensorFlow Dev Summit that states that the tensor_summary is still under development (starting at 9:17): https://youtu.be/eBbEDRsCmv4?t=9m17s. It will probably be better defined and examples should be provided in the future.