TFlearn evaluate method results meaning - tensorflow

I thought that TFlearn's evaluate method returns the accuracy of the model (0 to 1) but after training my model model.evaluate(test_x, test_y) returns a value > 1 (1.003626), so now I'm not sure I understand exactly what it returns.
Can anyone explain?

The evaluate method returns a dict, so the call would be
model.evaluate(test_x, test_y)['accuracy']
but I'm guessing that's not the problem. If you are doing classification, the test labels have to be integers for this to work. Other than that, without seeing more of your code, it's hard to debug.
Comments from the source code for evaluate:
Args:
x: Matrix of shape [n_samples, n_features...] or dictionary of many matrices
containing the input samples for fitting the model. Can be iterator that returns
arrays of features or dictionary of array of features. If set,input_fnmust
beNone.
y: Vector or matrix [n_samples] or [n_samples, n_outputs] containing the
label values (class labels in classification, real numbers in
regression) or dictionary of multiple vectors/matrices. Can be iterator
that returns array of targets or dictionary of array of targets. If set,
input_fn must beNone. Note: For classification, label values must
be integers representing the class index (i.e. values from 0 to
n_classes-1).

Related

Keras custom loss with dynamic variable for slicing

First, I would like to say that I only have little experience in Keras/Tensorflow and probably lack some understanding on tensors manipulations.
I am using a model which input is an "oversized" matrix (NxN). That is, I feed it with data that can be smaller (ie. (KxK), K <= N) where "missing" data (to fit the NxN shape) is filled with zeros. The output is an encoded version (Nx2) of the input.
I'm using a custom loss function that I would like to be computed only on the (Kx2) first values of the model's output. To do so, I think the solution is to "slice" the y_pred tensor in my loss function since I don't want to simply mask it with a boolean tensor. However, I can't figure out how to pass K as a dynamic argument to my custom loss.
Wrapping the function within another function that takes an argument does not fit my needs since the K value will change on each data sample
Passing K in the model's input and getting it back through a function wrapp (eg. https://stackoverflow.com/a/55445837/6315123) as mentionned in the first point does not work either, since slices cannot be computed from Tensor (as far as I understand); and evaluate the tensor within the loss function doesn't seem possible.
How can I pass such an argument to my loss function ?
Thanks !

Meaning and dimensions of tf.contrib.learn.DNNClassifier's extracted weights and biases

I relatively new to tensorflow, but even with a lot of research I was unable to find a documentation of certain variable meanings.
For my current project, I want to train a DNN with the help of tensorflow, and afterwards I want to extract the weight and bias matrices from it to use it in another application OUTSIDE tensorflow. For the first try, I set up a simple network with a [4, 10, 2] structure, which predicts a binary outcome.
I used 3 real_valued_columns and a single sparse_column_with_keys (wrapped in an embedding_column) as features:
def build_estimator(optimizer=None, activation_fn=tf.sigmoid):
"""Build an estimator"""
# Sparse base columns
column_stay_point = tf.contrib.layers.sparse_column_with_keys(
column_name='stay_point',
keys=['no', 'yes'])
# Continuous base columns
column_heading = tf.contrib.layers.real_valued_column('heading')
column_velocity = tf.contrib.layers.real_valued_column('velocity')
column_acceleration = tf.contrib.layers.real_valued_column('acceleration')
pedestrian_feature_columns = [column_heading,
column_velocity,
column_acceleration,
tf.contrib.layers.embedding_column(
column_stay_point,
dimension=8,
initializer=tf.truncated_normal_initializer)]
# Create classifier
estimator = tf.contrib.learn.DNNClassifier(
hidden_units=[10],
feature_columns=pedestrian_feature_columns,
model_dir='./tmp/pedestrian_model',
n_classes=2,
optimizer=optimizer,
activation_fn=activation_fn)
return estimator
I called this function with default arguments and used estimator.fit(...) to train the DNN. Aside from some warnings concerning the deprecated 'scalar_summary' function, it ran successfully and produced reasonable results. I printed all variables of the model by using the following line:
var = {k: estimator.get_variable_value(k) for k in estimator.get_variable_names())
I expected to get a weight matrices of size 10x4 and 2x10 as well as bias matrices of size 10x1 and 2x1. But I got the following:
'dnn/binary_logistic_head/dnn/learning_rate': 0.05 (actual value, scalar)
'dnn/input_from_feature_columns/stay_point_embedding/weights': 2x8 array
'dnn/hiddenlayer_0/weights/hiddenlayer_0/weights/part_0/Adagrad': 11x10 array
'dnn/input_from_feature_columns/stay_point_embedding/weights/int_embedding/weights/part_0/Adagrad': 2x8 array
'dnn/hiddenlayer_0/weights': 11x10 array
'dnn/logits/biases': 1x1' array
'dnn/logits/weights/nn/dnn/logits/weights/part_0/Adagrad': 10x1 array
'dnn/logits/weights': 10x1 array
'dnn/logits/biases/dnn/dnn/logits/biases/part_0/Adagrad': 1x1 array
'global_step': 5800, (actual value, scalar)
'dnn/hiddenlayer_0/biases': 1x10 array
'dnn/hiddenlayer_0/biases//hiddenlayer_0/biases/part_0/Adagrad': 1x10 array
Is there any documentation what these cryptic names mean and why do the matrices have these weird dimensions? Also, why are there references to the Adagrad optimizer despite never specifying it?
Any help is highly appreciated!
The number of input nodes in your network is 11 and not 4
8(embedding_column)+column_heading(1),column_velocity(1),column_acceleration(1) = 11
And based on the variable names the output is a binary logistic node, so the number of output nodes is only one and not 2.
Below are the weights/biases you are interested in.
dnn/hiddenlayer_0/weights': 11x10 array --> There are the weights from inputs to hidden nodes
dnn/hiddenlayer_0/biases': 1x10 array --> Biases of hidden nodes
dnn/logits/weights': 10x1 array --> Weights from hidden nodes to the output node
dnn/logits/biases': 1x1' array --> Bias of the output node.
why are there references to the Adagrad optimizer despite never specifying it?
Most probably the default optimizer is AdaGrad.

Use coo_matrix in TensorFlow

I'm doing a Matrix Factorization in TensorFlow, I want to use coo_matrix from Spicy.sparse cause it uses less memory and it makes it easy to put all my data into my matrix for training data.
Is it possible to use coo_matrix to initialize a variable in tensorflow?
Or do I have to create a session and feed the data I got into tensorflow using sess.run() with feed_dict.
I hope that you understand my question and my problem otherwise comment and i will try to fix it.
The closest thing TensorFlow has to scipy.sparse.coo_matrix is tf.SparseTensor, which is the sparse equivalent of tf.Tensor. It will probably be easiest to feed a coo_matrix into your program.
A tf.SparseTensor is a slight generalization of COO matrices, where the tensor is represented as three dense tf.Tensor objects:
indices: An N x D matrix of tf.int64 values in which each row represents the coordinates of a non-zero value. N is the number of non-zeroes, and D is the rank of the equivalent dense tensor (2 in the case of a matrix).
values: A length-N vector of values, where element i is the value of the element whose coordinates are given on row i of indices.
dense_shape: A length-D vector of tf.int64, representing the shape of the equivalent dense tensor.
For example, you could use the following code, which uses tf.sparse_placeholder() to define a tf.SparseTensor that you can feed, and a tf.SparseTensorValue that represents the actual value being fed :
sparse_input = tf.sparse_placeholder(dtype=tf.float32, shape=[100, 100])
# ...
train_op = ...
coo_matrix = scipy.sparse.coo_matrix(...)
# Wrap `coo_matrix` in the `tf.SparseTensorValue` form that TensorFlow expects.
# SciPy stores the row and column coordinates as separate vectors, so we must
# stack and transpose them to make an indices matrix of the appropriate shape.
tf_coo_matrix = tf.SparseTensorValue(
indices=np.array([coo_matrix.rows, coo_matrix.cols]).T,
values=coo_matrix.data,
dense_shape=coo_matrix.shape)
Once you have converted your coo_matrix to a tf.SparseTensorValue, you can feed sparse_input with the tf.SparseTensorValue directly:
sess.run(train_op, feed_dict={sparse_input: tf_coo_matrix})

understanding tensorflow sequence_loss parameters

The sequence_Loss module's source_code has three parameters that are required they list them as outputs, targets, and weights.
Outputs and targets are self explanatory, but I'm looking to better understand is what is the weight parameter?
The other thing I find confusing is that it states that the targets should be the same length as the outputs, what exactly do they mean by the length of a tensor? Especially if its a 3 dimensional tensor.
Think of the weights as a mask applied to the input tensor. In some NLP applications, we often have different sentence length for each sentence. In order to parallel/batch multiple instance sentences into a minibatch to feed into a neural net, people use a mask matrix to denotes which element in the the input tensor is actually a valid input. For instance, the weight can be a np.ones([batch, max_length]) that means all of the input elements are legit.
We can also use a matrix of the same shape as the labels such as np.asarray([[1,1,1,0],[1,1,0,0],[1,1,1,1]]) (we assume the labels shape is 3x4), then the crossEntropy of the first row last column will be masked out as 0.
You can also use weight to calculate weighted accumulation of cross entropy.
We used this in a class and our professor said we could just pass it ones of the right shape (the comment says "list of 1D batch-sized float-Tensors of the same length as logits"). That doesn't help with what they mean, but maybe it will help you get your code to run. Worked for me.
This code should do the trick: [tf.ones(batch_size, tf.float32) for _ in logits].
Edit: from TF code:
for logit, target, weight in zip(logits, targets, weights):
if softmax_loss_function is None:
# TODO(irving,ebrevdo): This reshape is needed because
# sequence_loss_by_example is called with scalars sometimes, which
# violates our general scalar strictness policy.
target = array_ops.reshape(target, [-1])
crossent = nn_ops.sparse_softmax_cross_entropy_with_logits(
logit, target)
else:
crossent = softmax_loss_function(logit, target)
log_perp_list.append(crossent * weight)
The weights that are passed are multiplied by the loss for that particular logit. So I guess if you want to take a particular prediction extra-seriously you can increase the weight above 1.

TensorFlow Embedding Lookup

I am trying to learn how to build RNN for Speech Recognition using TensorFlow. As a start, I wanted to try out some example models put up on TensorFlow page TF-RNN
As per what was advised, I had taken some time to understand how word IDs are embedded into a dense representation (Vector Representation) by working through the basic version of word2vec model code. I had an understanding of what tf.nn.embedding_lookup actually does, until I actually encountered the same function being used with two dimensional array in TF-RNN ptb_word_lm.py, when it did not make sense any more.
what I though tf.nn.embedding_lookup does:
Given a 2-d array params, and a 1-d array ids, function tf.nn.embedding_lookup fetches rows from params, corresponding to the indices given in ids, which holds with the dimension of output it is returning.
What I am confused about:
When tried with same params, and 2-d array ids, tf.nn.embedding_lookup returns 3-d array, instead of 2-d which I do not understand why.
I looked up the manual for Embedding Lookup, but I still find it difficult to understand how the partitioning works, and the result that is returned. I recently tried some simple example with tf.nn.embedding_lookup and it appears that it returns different values each time. Is this behaviour due to the randomness involved in partitioning ?
Please help me understand how tf.nn.embedding_lookup works, and why is used in both word2vec_basic.py, and ptb_word_lm.py i.e., what is the purpose of even using them ?
There is already an answer on what does tf.nn.embedding_lookup here.
When tried with same params, and 2-d array ids, tf.nn.embedding_lookup returns 3-d array, instead of 2-d which I do not understand why.
When you had a 1-D list of ids [0, 1], the function would return a list of embeddings [embedding_0, embedding_1] where embedding_0 is an array of shape embedding_size. For instance the list of ids could be a batch of words.
Now, you have a matrix of ids, or a list of list of ids. For instance, you now have a batch of sentences, i.e. a batch of list of words, i.e. a list of list of words.
If your list of sentences is: [[0, 1], [0, 3]] (sentence 1 is [0, 1], sentence 2 is [0, 3]), the function will compute a matrix of embeddings, which will be of shape [2, 2, embedding_size]and will look like:
[[embedding_0, embedding_1],
[embedding_0, embedding_3]]
Concerning the partition_strategy argument, you don't have to bother about it. Basically, it allows you to give a list of embedding matrices as params instead of 1 matrix, if you have limitations in computation.
So, you could split your embedding matrix of shape [1000, embedding_size] in ten matrices of shape [100, embedding_size] and pass this list of Variables as params. The argument partition_strategy handles the distribution of the vocabulary (the 1000 words) among the 10 matrices.