tf.nn.sparse_softmax_cross_entropy_with_logits - labels without one hot encoding in tensorflow - tensorflow

I am trying to understand how tf.nn.sparse_softmax_cross_entropy_with_logits works.
Description says:
A common use case is to have logits of shape [batch_size, num_classes]
and labels of shape [batch_size]. But higher dimensions are supported.
So it suggests that we can feed labels in raw form for example [1,2,3].
Now since all computations are done per batch I believe the following is possible:
In all cases we assume size of batch equal two.
Case 1 (with one batch):
logit:
0.4 0.2 0.4
0.3 0.3 0.4
correspoding labels:
2
3
I am guessing labels might be coded as
[1 0 0]
[0 1 0]
Case 2 (with another batch):
logit:
0.4 0.2 0.4
0.3 0.3 0.4
correspoding labels:
1
2
I am guessing labels might be coded as (I do not see what prevents us from this coding, unless tensorflow keeps track how it coded before)
[1 0 0]
[0 1 0]
So we have two different codings. Is it safe to assume that tensorflow keeps coding consistent from batch to batch?

There is no real coding happening. The labels are just the position of the 1 in the according one-hot vector:
0 -> [1, 0, 0]
1 -> [0, 1, 0]
2 -> [0, 0, 1]
This "coding" will be used in every batch.

Related

Cosine similarity and cosine distance formulas relation

Can someone explain these two formulas? Do they have any relationship?
def _cosine_distance(a, b, data_is_normalized=False):
if not data_is_normalized:
a = np.asarray(a) / np.linalg.norm(a, axis=1, keepdims=True)
b = np.asarray(b) / np.linalg.norm(b, axis=1, keepdims=True)
return 1. - np.dot(a, b.T)
def findCosineSimilarity(source_representation, test_representation):
a = np.matmul(np.transpose(source_representation), test_representation)
b = np.sum(np.multiply(source_representation, source_representation))
c = np.sum(np.multiply(test_representation, test_representation))
return 1 - (a / (np.sqrt(b) * np.sqrt(c)))```
Regarding your comment, the cosine distance of two matrices of shape 2 x 5 essentially consists of finding the pairwise cosine distance between the vectors in each array. Assuming you are working with row vectors (which you should when you use NumPy conventionally), the expected output should consist of 2 * 2 = 4 elements. If you are working with column vectors, then 5 * 5 = 25 elements makes sense.
_cosine_distance looks good
The function _cosine_distance is correct in naming and implementation generally for all cases where a in N^{n x l} and b in N^{m x l}.
To use _cosine_distance for 1D arrays you can simply add a singleton dimension at axis 0, e.g. _cosine_distance(a[np.newaxis], b[np.newaxis]).
findCosineSimilarity looks bad
findCosineSimilarity is incorrect in naming (it calculates the cosine distance), and the implementation only works if you have one dimensional arrays. Using this for anything other than 1D arrays will fail as it will compute something that is incorrect by the definition of cosine distance. Also, transposing source_representation (the left matrix) hints that the function is meant for column vectors, which differs from _cosine_distance, not that findCosineSimilarity would work for matrices anyways.
It is easy to create a column/row vector agnostic test case by using a n x n matrix:
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
If we calculate the pairwise cosine distance for every vector in the matrix we should get all zeros as the vectors are the same.
import numpy as np
def findCosineSimilarity(source_representation, test_representation):
a = np.matmul(np.transpose(source_representation), test_representation)
b = np.sum(np.multiply(source_representation, source_representation))
c = np.sum(np.multiply(test_representation, test_representation))
return 1 - (a / (np.sqrt(b) * np.sqrt(c)))
def _cosine_distance(a, b, data_is_normalized=False):
if not data_is_normalized:
a = np.asarray(a) / np.linalg.norm(a, axis=1, keepdims=True)
b = np.asarray(b) / np.linalg.norm(b, axis=1, keepdims=True)
return 1. - np.dot(a, b.T)
a = np.array([
[1,1,1,1],
[1,1,1,1],
[1,1,1,1],
[1,1,1,1]
])
print(findCosineSimilarity(a,a))
print(_cosine_distance(a, a))
Output:
[[0.75 0.75 0.75 0.75]
[0.75 0.75 0.75 0.75]
[0.75 0.75 0.75 0.75]
[0.75 0.75 0.75 0.75]]
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
We see that findCosineSimilarity fails, and that _cosine_distance is correct.

Interpolating lines of a Polygon

Let's suppose we have 5 (x,y) points which makes a closed loop or a polygon. How can I interpolate or upsample some points so that the polygon will have a more round-ish look instead of sharp linear lines between two points? E.g. see the image. What I have is on the left and what I want is on the right.
A simple MATLAB code is as follows:
xv = [0 2 3 2.5 1 0];
yv = [1 0 2 3.5 3.5 1];
plot(xv, yv)
xlim([-1 4])
ylim([-2 5])

How to get confusion matrix of Tensorflow hub model

I have trained a model based on transfer learning via Tensorflow Hub. I have been looking at many places for hints on producing Confusion matrix but I haven't been able to get to the right solution.
Does anyone know if this is possible?
The last thing I tried was to write the results in an Excel sheet but I couldn't find a formula for the multi-class computation of confusion matrix in Excel.
Any help will be great!!
You can try tf.math.confusion_matrix function.
It computes the confusion matrix from predictions and labels.
See https://www.tensorflow.org/api_docs/python/tf/math/confusion_matrix.
Example:
my_confusion_matrix = tf.math.confusion_matrix(labels=[1, 2, 4], predictions=[2, 2, 3])
with tf.Session() as sess:
print(sess.run(my_confusion_matrix))
# Prints #
[[0 0 0 0 0]
[0 0 1 0 0]
[0 0 1 0 0]
[0 0 0 0 0]
[0 0 0 1 0]]
# Labels are assumed to be [0, 1, 2, 3, 4] thus resulting in 5X5 confusion matrix

tensorflow reduce_mean with multidimension second argument

I met the usage of the reduce_mean with the vector as the second arguments. I looked through sensor flow manual but can't find the corresponding example. The codes are below:
tf.reduce_mean(train, [0,1,2]
where train is at size batchsize x H x L x 2
I also played with some experiments but can't figure out how this second vector input will be processed
tensor = tf.constant([[[2,2,4],[2,2,0]],[[2,2,0],[2,2,0]]])
trainenergy = tf.reduce_mean(tensor, [0,1,2])
Output = 1
tensor = tf.constant([[[2,2,4],[2,2,0]],[[2,2,0],[2,2,0]]])
trainenergy = tf.reduce_mean(tensor, [0])
Output = [[2 2 2]
[2 2 0]]
tensor = tf.constant([[[2,2,4],[2,2,0]],[[2,2,0],[2,2,0]]])
trainenergy = tf.reduce_mean(tensor, [0,1])
Output = [2 2 1]
Just figure out tf.reduce_mean(train, [0,1,2]) if the second argument is the vector. It will reduce the dimension as the order of the element is the vector. For example, the [0,1,2] will reduce along the axis of 0,1,2

Bernoulli random number generator

I cannot understand how Bernoulli Random Number generator used in numpy is calculated and would like some explanation on it. For example:
np.random.binomial(size=3, n=1, p= 0.5)
Results:
[1 0 0]
n = number of trails
p = probability of occurrence
size = number of experiments
With how do I determine the generated numbers/results of "0" or "1"?
=================================Update==================================
I created a Restricted Boltzmann Machine which always presents the same results despite being "random" on multiple code executions. The randomize is seeded using
np.random.seed(10)
import numpy as np
np.random.seed(10)
def sigmoid(u):
return 1/(1+np.exp(-u))
def gibbs_vhv(W, hbias, vbias, x):
f_s = sigmoid(np.dot(x, W) + hbias)
h_sample = np.random.binomial(size=f_s.shape, n=1, p=f_s)
f_u = sigmoid(np.dot(h_sample, W.transpose())+vbias)
v_sample = np.random.binomial(size=f_u.shape, n=1, p=f_u)
return [f_s, h_sample, f_u, v_sample]
def reconstruction_error(f_u, x):
cross_entropy = -np.mean(
np.sum(
x * np.log(sigmoid(f_u)) + (1 - x) * np.log(1 - sigmoid(f_u)),
axis=1))
return cross_entropy
X = np.array([[1, 0, 0, 0]])
#Weight to hidden
W = np.array([[-3.85, 10.14, 1.16],
[6.69, 2.84, -7.73],
[1.37, 10.76, -3.98],
[-6.18, -5.89, 8.29]])
hbias = np.array([1.04, -4.48, 2.50]) #<= 3 bias for 3 neuron in hidden
vbias = np.array([-6.33, -1.68, -1.25, 3.45]) #<= 4 bias for 4 neuron in input
k = 2
v_sample = X
for i in range(k):
[f_s, h_sample, f_u, v_sample] = gibbs_vhv(W, hbias, vbias, v_sample)
start = v_sample
if i < 2:
print('f_s:', f_s)
print('h_sample:', h_sample)
print('f_u:', f_u)
print('v_sample:', v_sample)
print(v_sample)
print('iter:', i, ' h:', h_sample, ' x:', v_sample, ' entropy:%.3f'%reconstruction_error(f_u, v_sample))
Results:
[[1 0 0 0]]
f_s: [[ 0.05678618 0.99652957 0.97491304]]
h_sample: [[0 1 1]]
f_u: [[ 0.99310473 0.00139984 0.99604968 0.99712837]]
v_sample: [[1 0 1 1]]
[[1 0 1 1]]
iter: 0 h: [[0 1 1]] x: [[1 0 1 1]] entropy:1.637
f_s: [[ 4.90301318e-04 9.99973278e-01 9.99654440e-01]]
h_sample: [[0 1 1]]
f_u: [[ 0.99310473 0.00139984 0.99604968 0.99712837]]
v_sample: [[1 0 1 1]]
[[1 0 1 1]]
iter: 1 h: [[0 1 1]] x: [[1 0 1 1]] entropy:1.637
I am asking on how the algorithm works to produce the numbers. – WhiteSolstice 35 mins ago
Non-technical explanation
If you pass n=1 to the Binomial distribution it is equivalent to the Bernoulli distribution. In this case the function could be thought of simulating coin flips. size=3 tells it to flip the coin three times and p=0.5 makes it a fair coin with equal probabilitiy of head (1) or tail (0).
The result of [1 0 0] means the coin came down once with head and twice with tail facing up. This is random, so running it again would result in a different sequence like [1 1 0], [0 1 0], or maybe even [1 1 1]. Although you cannot get the same number of 1s and 0s in three runs, on average you would get the same number.
Technical explanation
Numpy implements random number generation in C. The source code for the Binomial distribution can be found here. Actually two different algorithms are implemented.
If n * p <= 30 it uses inverse transform sampling.
If n * p > 30 the BTPE algorithm of (Kachitvichyanukul and Schmeiser 1988) is used. (The publication is not freely available.)
I think both methods, but certainly the inverse transform sampling, depend on a random number generator to produce uniformly distributed random numbers. Numpy internally uses a Mersenne Twister pseudo random number generator. The uniform random numbers are then transformed into the desired distribution.
A Binomially distributed random variable has two parameters n and p, and can be thought of as the distribution of the number of heads obtained when flipping a biased coin n times, where the probability of getting a head at each flip is p. (More formally it is a sum of independent Bernoulli random variables with parameter p).
For instance, if n=10 and p=0.5, one could simulate a draw from Bin(10, 0.5) by flipping a fair coin 10 times and summing the number of times that the coin lands heads.
In addition to the n and p parameters described above, np.random.binomial has an additional size parameter. If size=1, np.random.binomial computes a single draw from the Binomial distribution. If size=k for some integer k, k independent draws from the same Binomial distribution will be computed. size can also be an array of indices, in which case a whole np.array with the given size will be filled with independent draws from the Binomial distribution.
Note that the Binomial distribution is a generalisation of the Bernoulli distribution - in the case that n=1, Bin(n,p) has the same distribution as Ber(p).
For more information about the binomial distribution see: https://en.wikipedia.org/wiki/Binomial_distribution