Errors implementing Sampled Softmax Tensorflow

Errors implementing Sampled Softmax Tensorflow - tensorflow

I have been trying for a while to implement sampled softmax because I have half a million output classes.
I have tried to follow the official documentation exactly, but I always get an error. This is my code:
def forward_propagation_sampled(X, parameters):
W1 = parameters['W1']
b1 = parameters['b1']
W2 = parameters['W2']
b2 = parameters['b2']
W3 = parameters['W3']
b3 = parameters['b3']
Z1 = tf.add(tf.matmul(W1, X), b1)
A1 = tf.nn.relu(Z1)
Z2 = tf.add(tf.matmul(W2,A1), b2)
A2 = tf.nn.relu(Z2)
Z3 = tf.add(tf.matmul(W3,A2), b3)
return Z3, W3, b3
This is the cost computation function:
def compute_cost(Z3, W3, b3, Y, mode):
Z3.set_shape([1144,1])
if mode == "train":
loss = tf.nn.sampled_softmax_loss(
weights=tf.transpose(W3),
biases=tf.Variable(b3),
labels = tf.reshape(tf.argmax(Y, 1), [-1,1]), #Since Y is one hot encoded
inputs=tf.Variable(initial_value=Z3,dtype=tf.float32, expected_shape=[1144,1]),
num_sampled = 2000,
num_classes = 1144,
partition_strategy="div"
)
elif mode == "eval":
logits = tf.matmul(inputs, tf.transpose(weights))
logits = tf.nn.bias_add(logits, biases)
labels_one_hot = tf.one_hot(labels, n_classes)
loss = tf.nn.softmax_cross_entropy_with_logits(labels=labels_one_hot,logits=logits)
cost = tf.reduce_mean(loss)
return cost
For the purpose of just testing this out, I am using 1144 output classes, which would otherwise scale to 500,000. There are 3144 training examples.
I get this error:
Shape must be rank 1 but is rank 2 for 'sampled_softmax_loss/Slice_1' (op: 'Slice') with input shapes: [3144,1], [1], [1].
I am unable to debug this or make any sense out of it. Any help would be really appreciated.

Related

how different between create ANN using a matrix multiplication and tf.layers.dense() in Tensorflow

I tried to train the ANN model using the matrix multiplication and tf.layers.dense(). But I got diferent result , ANN model using the matrix multiplication it can not optimize the loss function (loss increase). how different between two method ?
ANN model using the matrix multiplication
W1 = tf.Variable(tf.zeros([4,64]))
b1 = tf.Variable(tf.zeros([64]))
y1 = tf.nn.relu(tf.matmul(x, W1) + b1)
W2 = tf.Variable(tf.zeros([64,64]))
b2 = tf.Variable(tf.zeros([64]))
y2 = tf.nn.relu(tf.matmul(y1, W2) + b2)
W3 = tf.Variable(tf.zeros([64,64]))
b3 = tf.Variable(tf.zeros([64]))
y3 = tf.nn.relu(tf.matmul(y2, W3) + b3)
W4 = tf.Variable(tf.zeros([64,3]))
b4 = tf.Variable(tf.zeros([3]))
y_out = tf.nn.softmax(tf.matmul(y3, W4) + b4)
ANN model using tf.layers.dense()
layer1 = tf.layers.dense(x, 64, activation=tf.nn.relu)
layer2 = tf.layers.dense(layer1, 64, activation=tf.nn.relu)
layer3 = tf.layers.dense(layer2, 64, activation=tf.nn.relu)
layer4 = tf.layers.dense(layer3, 64, activation=tf.nn.relu)
layer5 = tf.layers.dense(layer4, 64, activation=tf.nn.relu)
layer6 = tf.layers.dense(layer5, 64, activation=tf.nn.relu)
y_out = tf.layers.dense(layer6, 3 , activation = tf.nn.softmax)

You are initializing the weights with zeros, which effectively prevents the network from learning anything as the network always outputs zero, and the gradient is always zero.
Initialize your weights with random values, like uniform or gaussian distribution with a small range (less than 0.1).

Tensorflow - What is wrong if mainly bias is updated?

I use a pre-trained network from Tensorflow-Hub and pass the outcoming vector through 2 fully connected layers. I initialize the weight matrices with He-initialization and the biases with 0.
The loss function is behaving strangly. Also it does update the weights matrices somewhat, but mainly the biases.
Does anybody know, how to improve the learning?
Thanks in advance!
with tf.name_scope('tf_hub'):
module = hub.Module("https://tfhub.dev/google/imagenet/pnasnet_large/feature_vector/2")
tf_hub_features = module(X) # Features with shape [batch_size, num_features].
he_initializer = tf.contrib.layers.variance_scaling_initializer(factor=2.0, mode='FAN_IN', uniform=False)
with tf.name_scope('Hidden1'):
W1 = tf.get_variable(initializer=he_initializer, shape=[Constants.PNAS_NET2_NB_FEATURES, config["h1_nb_units"]],
name="W1")
# W1 = tf.Variable(tf.random_normal([Constants.PNAS_NET2_NB_FEATURES, config["h1_nb_units"]]), name="W1")
tf.summary.histogram("W1", W1)
b1 = tf.Variable(tf.zeros([config["h1_nb_units"]]), name="b1")
tf.summary.histogram("b1", b1)
o1 = tf.nn.relu(tf.matmul(tf_hub_features, W1) + b1, name="o1")
# dropout1 = tf.layers.dropout(inputs=o1, rate=config["keep_probability"], name="dropout1")
with tf.name_scope('Hidden2'):
W2 = tf.get_variable(initializer=he_initializer, shape=[config["h1_nb_units"], config["h2_nb_units"]],
name="W2")
tf.summary.histogram("W2", W2)
b2 = tf.Variable(tf.zeros([config["h2_nb_units"]]), name="b2")
tf.summary.histogram("b2", b2)
o2 = tf.nn.relu(tf.matmul(o1, W2) + b2, name="o2")
with tf.name_scope('Y'):
WY = tf.get_variable(initializer=he_initializer, shape=[config["h2_nb_units"], config["output_dim"]],
name="WY")
tf.summary.histogram("WY", WY)
bY = tf.Variable(tf.zeros([config["output_dim"]]), name="bY")
tf.summary.histogram("bY", bY)
Y_star = tf.add(tf.matmul(o2, WY), bY, name="Y_star")
Y = tf.nn.sigmoid(Y_star, name="Y")
with tf.name_scope('loss'):
Y_ = tf.placeholder(tf.float32, shape=(None, 1), name="Y_")
loss = tf.losses.log_loss(Y_, Y_hat)
optimizer = tf.train.AdamOptimizer(config["learning_rate"])
train_step = optimizer.minimize(loss)

The answer is quite simple. I had an error in feeding the input. They were all zeros and some ones. Therefore, there were only minor changes in the weights. I suppose the bias adjusted since it will learn something like the "mean" in linear regression.

Why we add bias term in L2 regularization?

I want to understand that how l2 regularization is implement here. In l2 regularization we add a square of weights to the loss function. But in this code we are also adding bias term. Why is it so?
`x = tf.placeholder(tf.float32, [None, nPixels])
W1 = tf.Variable(tf.random_normal([nPixels, nNodes1], stddev=0.01))
b1 = tf.Variable(tf.zeros([nNodes1])
y1 = tf.nn.sigmoid(tf.matmul(x, W1) + b1)
W2 = tf.Variable(tf.random_normal([nNodes1, nLabels], stddev=0.01))
b2 = tf.Variable(tf.zeros([nLabels]))
y = tf.matmul(y1, W2) + b2
y_ = tf.placeholder(tf.float32, [None, nLabels])
l2_loss = tf.nn.l2_loss(W1) + tf.nn.l2_loss(b1) + tf.nn.l2_loss(W2) +
tf.nn.l2_loss(b2)
cross_entropy =
tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=y_, logits=y))
regularized_cross_entropy = cross_entropy + beta * l2_loss`

This bias is not the same we used in l2 regularization.
This is the bias we add in neural network to keep the value not equal to zero.

changing the gradient map estimation

I am trying to estimate the forward pass and the backword gradient of the function below:
def func(img-batch, X1,X2):
L=1
A1 = X1*L**2
A2 = X2*L**2
AA1 = A1*A1
AA2 = A2*A2
A11A2 = A1*A2
v = tf.nn.conv2d(img-batch, A1A2, strides=[1, 1, 1, 1], padding='SAME')
v = v+ AA1+AA2
return v
When I add this function to the network, the gradient will be performed on each instruction of the function by default.
How can I use this function and calculate it in the forward pass, in the meantime ignoring the gradient of each instruction in the function and provide other gradient estimation and add it to the main gradient of the model?

You can use py_func to ignore the gradients in this function, and use gradient_override_map to provide customized gradients. Here is an example:
import tensorflow as tf
def myfunc(X1, X2):
L = 1
A1 = X1 * L**2
A2 = X2 * L**2
AA1 = A1 * A1
AA2 = A2 * A2
A11A2 = A1 * A2
...
v = AA1 + AA2 + A11A2
return v
#tf.RegisterGradient("GradMyfunc")
def grad_myfunc(op, grad):
X1 = op.inputs[0]
X2 = op.inputs[1]
return [grad * X2, grad * X1]
X1 = tf.Variable(tf.constant(1.1, dtype=tf.float64))
X2 = tf.Variable(tf.constant(2.2, dtype=tf.float64))
g = tf.get_default_graph()
with g.gradient_override_map({"PyFunc": "GradMyfunc"}):
y = tf.py_func(myfunc, [X1, X2], [tf.float64])
with tf.Session() as sess:
grad = tf.gradients(y, [X1, X2])
sess.run(tf.global_variables_initializer())
print(sess.run(y))
print(sess.run(grad))

optimizer in tensorflow does not work in nonlinear works

I am new to tensorflow and have tried to implement a simple one-layer linear network similar to https://www.tensorflow.org/get_started/mnist/beginners
x = tf.placeholder(tf.float32, [None, IN_SIZE], name="input")
W1 = tf.Variable(tf.zeros([IN_SIZE, OUT_SIZE]), name="Weight1")
b1 = tf.Variable(tf.zeros([OUT_SIZE]), name="bias1")
y = tf.matmul(x, W1) + b1
y_ = tf.placeholder(tf.float32, [None, OUT_SIZE], name="target")
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
train_step = tf.train.AdamOptimizer(1e-3).minimize(cross_entropy)
The program works as expected and I have no problem on that. However, I try to add another layer but only found the W1,b1,W2 learnt are all zero matrix, and only the bias b2 contains nonzero values. Below is my modified network
x = tf.placeholder(tf.float32, [None, IN_SIZE], name="input")
W1 = tf.Variable(tf.zeros([IN_SIZE, L1_SIZE]), name="Weight1")
b1 = tf.Variable(tf.zeros([L1_SIZE]), name="bias1")
y = tf.matmul(x, W1) + b1
W2 = tf.Variable(tf.zeros([L1_SIZE, OUT_SIZE]), name="Weight2")
b2 = tf.Variable(tf.zeros([OUT_SIZE]), name="bias2")
y = tf.nn.relu(y)
y = tf.matmul(y, W2) + b2
# Define loss and optimizer
y_ = tf.placeholder(tf.float32, [None, OUT_SIZE], name="target")
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
train_step = tf.train.AdamOptimizer(1e-3).minimize(cross_entropy)

The problem is that if you initialize the weight matrices before a relu with zeroes the gradients will always be zero and no learning will happen. You need to do random initialization.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Errors implementing Sampled Softmax Tensorflow - tensorflow

Related

how different between create ANN using a matrix multiplication and tf.layers.dense() in Tensorflow

Tensorflow - What is wrong if mainly bias is updated?

Why we add bias term in L2 regularization?

changing the gradient map estimation

optimizer in tensorflow does not work in nonlinear works

Categories

Resources