Ensemble network with categorical distribution in tensorflow - tensorflow

I have n networks, each with the same input / output. I want to randomly select one of the outputs according to a categorical distribution. Tfp.Categorical outputs only integers and I tried to do something like
act_dist = tfp.distributions.Categorical(logits=act_logits) # act_logits are all the same, so the distribution is uniform
rand_out = act_dist.sample()
x = nn_out1 * tf.cast(rand_out == 0., dtype=tf.float32) + ... # for all my n networks
But rand_out == 0. is always false, as well as the other conditions.
Any idea for achieving what I need?

You might also look at MixtureSameFamily, which does a gather under the covers for you.
nn_out1 = tf.expand_dims(nn_out1, axis=2)
...
outs = tf.concat([nn_out1, nn_nout2, ...], axis=2)
probs = tf.tile(tf.reduce_mean(tf.ones_like(nn_out1), axis=1, keepdims=True) / n, [1, n]) # trick to have ones of shape [None,1]
dist = tfp.distributions.MixtureSameFamily(
mixture_distribution=tfp.distributions.Categorical(probs=probs),
components_distribution=tfp.distributions.Deterministic(loc=outs))
x = dist.sample()

I think you need to use tf.equal, because Tensor == 0 is always False.
Separately though, you might want to use OneHotCategorical. For training, you might also try using RelaxedOneHotCategorical.

Related

gather values from 2dim tensor in tensorflow

Hi tensorflow beginner here... I'm trying to get the value of a certain elements in an 2 dim tensor, in my case class scores from a probability matrix.
The probability matrix is (1000,81) with batchsize 1000 and number of classes 81. ClassIDs is (1000,) and contains the index for the highest class score for each sample. How do I get the corresponding class score from the probability matrix using tf.gather?
class_ids = tf.cast(tf.argmax(probs, axis=1), tf.int32)
class_scores = tf.gather_nd(probs,class_ids)
class_scores should be a tensor of shape (1000,) containing the highest class_score for each sample.
Right now I'm using a workaround that looks like this:
class_score_count = []
for i in range(probs.shape[0]):
prob = probs[i,:]
class_score = prob[class_ids[i]]
class_score_count.append(class_score)
class_scores = tf.stack(class_score_count, axis=0)
Thanks for the help!
You can do it with tf.gather_nd like this:
class_ids = tf.cast(tf.argmax(probs, axis=1), tf.int32)
# If shape is not dynamic you can use probs.shape[0].value instead of tf.shape(probs)[0]
row_ids = tf.range(tf.shape(probs)[0], dtype=tf.int32)
idx = tf.stack([row_ids, class_ids], axis=1)
class_scores = tf.gather_nd(probs, idx)
You could also just use tf.reduce_max, even though it would actually compute the maximum again it may not be much slower if your data is not too big:
class_scores = tf.reduce_max(probs, axis=1)
you need to run the tensor class_ids to get the values
the values will be a bumpy array
you can access numpy array normally by a loop
you have to do something like this :
predictions = sess.run(tf.argmax(probs, 1), feed_dict={x: X_data})
predictions variable has all the information you need
tensorflow only returns those tensor values which you run explicitly
I think this is what the batch_dims argument for tf.gather is for.

Minimizing negative log-likelihood of logistic regression, scipy returning warning: "Desired error not necessarily achieved due to precision loss."

I'm trying to sort out why scipy optimize isn't converging on a solution for the minimum negative-log-likelihood of the logistic regression function (as implemented below).
It seems to converge for smaller data sets, but for the larger data sets scipy returns the warning: "Desired error not necessarily achieved due to precision loss."
I thought this was a well-behaved optimization problem, so I'm anxious that I'm missing an obvious mistake.
Can anyone spot a mistake in my implementation or make a suggestion that I might try?
I'm using the default method, but I have had little luck with the other various methods that miminize allows.
Many thanks!
Quick summary of the implementation. I'm minimizing the following statement:
with the caveat that since b is a constant, I'm using the exponent -(w*x + b). I think I've implemented that function correct, but maybe I'm not seeing something. Since the data are constants with respect to the function being minimized, I just output a function definition that retains the data within it; thus, the function to be minimized only accepts the weights.
The data is a pandas dataframe of the format: rows == samples, columns == attributes, but LAST column == label (0 or 1). I've transformed all the data to make sure it is continuous, and I've normalized it to have a mean of 0 and a standard deviation of 1. I'm also starting with random weights between [0, 0.1], treating the first weight as 'b'.
def get_optimization_func_call(data, sheepda):
#
# Extract pos/neg data without label
pos_df = data[data[LABEL] == 1].as_matrix()[:, :-1]
neg_df = data[data[LABEL] == 0].as_matrix()[:, :-1]
#
# Def evaluation of positive terms by row
def eval_pos_row(pos_row, w, b):
cur_exponent = np.dot(w, pos_row) + b
cur_val = expit(cur_exponent)
if cur_val == 0:
print("pos", cur_exponent)
return (-1 * np.log(cur_val))
#
# Def evaluation of positive terms by row
def eval_neg_row(neg_row, w, b):
cur_exponent = np.dot(w, neg_row) + b
cur_val = 1.0 - expit(cur_exponent)
if cur_val == 0:
print("neg", cur_exponent)
return (-1 * np.log(cur_val))
#
# Define the function used for optimization
def log_likelihood(weights):
#
# Separate weights
w = weights[1:]
b = weights[0]
#
# Ge the norm of weights
w_norm = np.dot(w, w)
#
# Sum over positive examples
pos_sum = np.sum(
np.apply_along_axis(eval_pos_row, 1, pos_df, w, b)
)
neg_sum = np.sum(
np.apply_along_axis(eval_neg_row, 1, neg_df, w, b)
)
#
return (0.5 * w_norm) + sheepda * (pos_sum + neg_sum)
return log_likelihood
w = uniform.rvs(size=20) / 10.0
LL = get_optimization_func_call(clean_test_data, 0.5)
res = minimize(LL, w, options={"maxiter": 1e4, "disp": True})

How does tf.nn.moments calculate variance?

Look at the test example:
import tensorflow as tf
x = tf.constant([[1,2],[3,4],[5,6]])
mean, variance = tf.nn.moments(x, [0])
with tf.Session() as sess:
m, v = sess.run([mean, variance])
print(m, v)
The output is:
[3 4]
[2 2]
We want to calculate variance along the axis 0, the first column is [1,3,5], and mean = (1+3+5)/3=3, it is right, the variance = [(1-3)^2+(3-3)^2+(5-3)^2]/3=2.6666, but the output is 2, who can tell me how tf.nn.moments calculates variance?
By the way, view the API DOC, what does shift do?
The problem is that x is an integer tensor and TensorFlow, instead of forcing a conversion, performs the computation as good as it can without changing the type (so the outputs are also integers). You can pass float numbers in the construction of x or specify the dtype parameter of tf.constant:
x = tf.constant([[1,2],[3,4],[5,6]], dtype=tf.float32)
Then you get the expected result:
import tensorflow as tf
x = tf.constant([[1,2],[3,4],[5,6]], dtype=tf.float32)
mean, variance = tf.nn.moments(x, [0])
with tf.Session() as sess:
m, v = sess.run([mean, variance])
print(m, v)
>>> [ 3. 4.] [ 2.66666675 2.66666675]
About the shift parameter, it seems to allow you specify a value to, well, "shift" the input. By shift they mean subtract, so if your input is [1., 2., 4.] and you give a shift of, say, 2.5, TensorFlow would first subtract that amount and compute the moments from [-1.5, 0.5, 1.5]. In general, it seems safe to just leave it as None, which will perform a shift by the mean of the input, but I suppose there may be cases where giving a predetermined shift value (e.g. if you know or have an approximate idea of the mean of the input) may yield better numerical stability.
# Replace the following line with correct data dtype
x = tf.constant([[1,2],[3,4],[5,6]])
# suppose you don't want tensorflow to trim the decimal then use float data type.
x = tf.constant([[1,2],[3,4],[5,6]], dtype=tf.float32)
Results: array([ 2.66666675, 2.66666675], dtype=float32)
Note: from the original implementation shift is not used

Tensorflow embedding lookup with unequal sized lists

Hej guys,
I'm trying to project multi labeled categorical data into a dense space using embeddings.
Here's an toy example. Let's say I have four categories and want to project them into a 2D space. Furthermore I got two instances, the first one belonging to category 0 and the second one to category 1.
The code will look something like this:
sess = tf.InteractiveSession()
embeddings = tf.Variable(tf.random_uniform([4, 2], -1.0, 1.0))
sess.run(tf.global_variables_initializer())
y = tf.nn.embedding_lookup(embeddings, [0,1])
y.eval()
and return something like this:
array([[ 0.93999457, -0.83051205],
[-0.1699729 , 0.73936272]], dtype=float32)
So far, so good. Now imagine an instance belongs to two categories. The embedding lookup will return two vectors which I can reduce by mean for example:
y = tf.nn.embedding_lookup(embeddings, [[0,1],[1,2]]) # two categories
y_ = tf.reduce_mean(y, axis=1)
y_.eval()
This works just like I expect it as well. My problem now arises when instances in my batch are not belonging to the same amount of categories e.g.:
y = tf.nn.embedding_lookup(embeddings, [[0,1],[1,2,3]]) # unequal sized lists
y_ = tf.reduce_mean(y, axis=1)
y_.eval()
ValueError: Argument must be a dense tensor: [[0, 1], [1, 2, 3]] - got shape [2], but wanted [2, 2].
Any idea about how to get around this problem?

masked softmax in theano

I am wondering if it possible to apply a mask before performing theano.tensor.nnet.softmax?
This is the behavior I am looking for:
>>>a = np.array([[1,2,3,4]])
>>>m = np.array([[1,0,1,0]]) # ignore index 1 and 3
>>>theano.tensor.nnet.softmax(a,m)
array([[ 0.11920292, 0. , 0.88079708, 0. ]])
Note that a and m are matrices, so I would like the softmax with work on an entire matrix and perform row-wise masked softmax.
Also the output should be the same shape as a, so the solution can not do advanced indexing e.g. theano.tensor.softmax(a[0,[0,2]])
def masked_softmax(a, m, axis):
e_a = T.exp(a)
masked_e = e_a * m
sum_masked_e = T.sum(masked_e, axis, keepdims=True)
return masked_e / sum_masked_e
theano.tensor.switch is one way to do this.
In the computational graph you can do the following:
a_mask = theano.tensor.switch(m, a, np.NINF)
sm = theano.tensor.softmax(a_mask)
hope it helps others.