how does TensorFlow handle the differentials for L1 regularization? - tensorflow

it seems that you can just declare a cost function by tf.abs() and then pass it down to auto-gradient generation (see https://github.com/nfmcclure/tensorflow_cookbook/blob/master/03_Linear_Regression/04_Loss_Functions_in_Linear_Regressions/04_lin_reg_l1_vs_l2.py)
. but we know abs() is not differentiable.
how is this done in Tensorflow? does it just randomly throw a number in [-1,1] ?
if someone could please point me to the implementation that would be great. Thanks!
(I looked for tensorflow.py in the git, but it does not even exist)

f(x) = abs(x) is differentiable everywhere, except at x=0. It derivative equals:
So the only question is how tensorflow implements derivative at x=0. You can check this manually:
import tensorflow as tf
x = tf.Variable(0.0)
y = tf.abs(x)
grad = tf.gradients(y, [x])[0]
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(grad))
It prints 0.0.

A modified version, based on #standy 's answer.
Which you can modify the function yourself:
import tensorflow as tf
x = tf.Variable(0.0)
y = tf.where(tf.greater(x, 0), x+2, 2) # The piecewise-defined function here is:y=2 (x<0), y=x+2 (x>=0)
grad = tf.gradients(y, [x])[0]
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(grad))
I would recommend a post of mine, visulized L1 & L2 Regularization with echarts:
https://simzhou.com/en/posts/2021/cross-entropy-loss-visualized/

Related

Is the argsort function differentiable in Tensorflow?

By this I mean, can I include it in a loss function and have autodiff function properly?
The raw_ops docs (https://www.tensorflow.org/api_docs/python/tf/raw_ops) has no listing for sort or argsort.
I run the following experiment in colab
import tensorflow as tf
x = tf.constant([[4.0, 2.1, 1.0]])
w = tf.Variable([[1.0, 1.0, 1.0]], name='w')
y_true = tf.constant([[1.0, 2.0, 3.0]])
#tf.function
def loss_fn(y_true, y_pred):
indices = tf.argsort(y_pred)
x = tf.gather(y_pred, indices, axis=-1)
return tf.reduce_sum(tf.square(y_true - x))
with tf.GradientTape() as tape:
y = x * w
loss = loss_fn(y_true, y)
tape.gradient(loss, [w])
The computed loss in 1.01 and the gradients for w seem to make sense to me.
So I would say the answer is yes, if you are using argsort() for indexing purposes. If you have something else in mind maybe you can tweak the example above and figure out if the gradients behave as you expect.

Sampling from tensor that depends on a random variable in tensorflow

Is it possible to get samples from a tensor that depends on a random variable in tensorflow? I need to get an approximate sample distribution to use in a loss function to be optimized. Specifically, in the example below, I want to be able to obtain samples of Y_output in order to be able to calculate the mean and variance of the output distribution and use these parameters in a loss function.
def sample_weight(mean, phi, seed=1):
P_epsilon = tf.distributions.Normal(loc=0., scale=1.0)
epsilon_s = P_epsilon.sample([1])
s = tf.multiply(epsilon_s, tf.log(1.0+tf.exp(phi)))
weight_sample = mean + s
return weight_sample
X = tf.placeholder(tf.float32, shape=[None, 1], name="X")
Y_labels = tf.placeholder(tf.float32, shape=[None, 1], name="Y_labels")
sw0 = sample_weight(u0,p0)
sw1 = sample_weight(u1,p1)
Y_output = sw0 + tf.multiply(sw1,X)
loss = tf.losses.mean_squared_error(labels=Y_labels, predictions=Y_output)
train_op = tf.train.AdamOptimizer(0.5e-1).minimize(loss)
init_op = tf.global_variables_initializer()
losses = []
predictions = []
Fx = lambda x: 0.5*x + 5.0
xrnge = 50
xs, ys = build_toy_data(funcx=Fx, stdev=2.0, num=xrnge)
with tf.Session() as sess:
sess.run(init_op)
iterations=1000
for i in range(iterations):
stat = sess.run(loss, feed_dict={X: xs, Y_labels: ys})
Not sure if this answers your question, but: when you have a Tensor downstream from a sampling Op (e.g., the Op created by your call to P_epsilon.sample([1]), anytime you call sess.run on the downstream Tensor, the sample op will be re-run, and produce a new random value. Example:
import tensorflow as tf
from tensorflow_probability import distributions as tfd
n = tfd.Normal(0., 1.)
s = n.sample()
y = s**2
sess = tf.Session() # Don't actually do this -- use context manager
print(sess.run(y))
# ==> 0.13539088
print(sess.run(y))
# ==> 0.15465781
print(sess.run(y))
# ==> 4.7929106
If you want a bunch of samples of y, you could do
import tensorflow as tf
from tensorflow_probability import distributions as tfd
n = tfd.Normal(0., 1.)
s = n.sample(100)
y = s**2
sess = tf.Session() # Don't actually do this -- use context manager
print(sess.run(y))
# ==> vector of 100 squared random normal values
We also have some cool tools in tensorflow_probability to do the kind of thing you're driving at here. Namely the Bijector API and, somewhat simpler, the trainable_distributions API.
(Another minor point: I'd suggest using tf.nn.softplus, or at a minimum tf.log1p(tf.exp(x)) instead of tf.log(1.0 + tf.exp(x)). The latter has poor numerical properties due to floating point imprecision, which the former are optimized for).
Hope this is some help!

Minimizing a two variables function in tensorflow

I am new to tensorflow and I'm looking for tutorials minimizing an equation
I tried to implement an example for minimizing a function:
import tensorflow as tf
x = tf.Variable(random.randn, name='x')
y= tf.Variable(random.randn, name='y')
fx = 2*x -3*y
opt = tf.train.GradientDescentOptimizer(0.1).minimize(fx)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(5):
print(sess.run([x,y]))
sess.run(opt)
Works very good.
But how can I do it for this type of equation as an example:
e^x+ xy = 20
One way would be to minimize the l2 loss, i.e:
fx = tf.nn.l2_loss(tf.exp(x)+tf.multiply(x,y)-20)
There are other possibilities of course, but this is an example.

Simple custom gradient with gradient_override_map

I want to use a function that creates weights for a normal dense layer, it basically behaves like an initialization function, only that it "initializes" before every new forward pass.
The flow for my augmented linear layer looks like this:
input = (x, W)
W_new = g(x,W)
output = tf.matmul(x,W_new)
However, g(x,W) is not differentiable, as it involves some sampling. Luckily it also doesn't have any parameters I want to learn so I just try to do the forward and backward pass, as if I would have never replaced W.
Now I need to tell the automatic differentiation to not backpropagate through g(). I do this with:
W_new = tf.stop_gradient(g(x,W))
Unfortunately this does not work, as it complains about non-matching shapes.
What does work is the following:
input = (x, W)
W_new = W + tf.stop_gradient(g(x,W) - W)
output = tf.matmul(x,W_new)
as suggested here: https://stackoverflow.com/a/36480182
Now the forward pass seems to be OK, but I don't know how to override the gradient for the backward pass. I know, that I have to use: gradient_override_map for this, but could not transfer applications I have seen to my particular usecase (I am still quite new to TF).
However, I am not sure how to do this and if there isn't an easier way. I assume something similar has to be done in the first forward pass in a given model, where all weights are initialized while we don't have to backpropagate through the init functions as well.
Any help would be very much appreciated!
Hey #jhj I too faced the same problem fortunately I found this gist. Hope this helps :)
Sample working -
import tensorflow as tf
from tensorflow.python.framework import ops
import numpy as np
Define custom py_func which takes also a grad op as argument:
def py_func(func, inp, Tout, stateful=True, name=None, grad=None):
# Need to generate a unique name to avoid duplicates:
rnd_name = 'PyFuncGrad' + str(np.random.randint(0, 1E+8))
tf.RegisterGradient(rnd_name)(grad) # see _MySquareGrad for grad example
g = tf.get_default_graph()
with g.gradient_override_map({"PyFunc": rnd_name, "PyFuncStateless": rnd_name}):
return tf.py_func(func, inp, Tout, stateful=stateful, name=name)
Def custom square function using np.square instead of tf.square:
def mysquare(x, name=None):
with ops.name_scope(name, "Mysquare", [x]) as name:
sqr_x = py_func(np.square,
[x],
[tf.float32],
name=name,
grad=_MySquareGrad) # <-- here's the call to the gradient
return sqr_x[0]
Actual gradient:
def _MySquareGrad(op, grad):
x = op.inputs[0]
return grad * 20 * x # add a "small" error just to see the difference:
with tf.Session() as sess:
x = tf.constant([1., 2.])
y = mysquare(x)
tf.global_variables_initializer().run()
print(x.eval(), y.eval(), tf.gradients(y, x)[0].eval())

Treating a tensorflow Defun as a closure

I'm having trouble using the Defun decorator in tensorflow. Namely, Defun can't close over any TF ops created outside. Below is a self-contained example showing what I'd like to do. Note that the tensor x belongs to different graphs inside and outside the call to custom_op. The Defun code creates a temporary graph, translates the graph into a function proto, and then merges this into the original graph. The code crashes in the first step, since the tensors that we close over are not in the new temporary graph. Is there a way around this? Being able to close over things would be very helpful.
import tensorflow as tf
from tensorflow.python.framework import function
w = tf.Variable(1.0)
function_factory = lambda x: x*w
#function.Defun(x=tf.float32)
def custom_op(x):
print('graph for x inside custom_op: ', x.graph)
return function_factory(x)
x = tf.constant(2.0)
print('graph for x outside custom_op: ', x.graph)
y = custom_op(x)
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
sess.run(y)
No, the Defun decorator does not capture everything. You need to pass in the w explicitly, as following in your example:
import tensorflow as tf
from tensorflow.python.framework import function
w = tf.Variable(1.0)
#function.Defun(tf.float32, tf.float32)
def custom_op(x, w):
print('graph for x inside custom_op: ', x.graph)
return x * w
x = tf.constant(2.0)
print('graph for x outside custom_op: ', x.graph)
y = custom_op(x, tf.identity(w))
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
sess.run(y)
(we may add more complete support for capturing if the need is high.)