Tensorflow - optimizing part of a variable - optimization

let's say i'm optimizing Ax = b where A is a matrix and x,b are vectors.
my question - is it possible to optimize it only on subset of A? specifically, a patch of A.
in other words, i would like to keep as constant a subset of parameters in A.
is it possible in TensorFlow?
I thought about using tf.silce(), but it creates a new reference of the variable
Thanks!

Unless I've misunderstood your question (or there's missing context), just define the parts of A you want to optimise over using tf.Variable(), and define the parts you don't using tf.Constant().

You can either use tf.stop_gradient or the var_list parameter of your optimizer.
See this answer for more details: https://stackoverflow.com/a/34478044/4554460

Related

Different optimization behavior using np.random-normal instead of tf.random_normal

I’m looking into the code from https://github.com/AshishBora/csgm and experience some strange behavior when using np.random.normal instead of tf.random_normal as initializing of a tf.Variable. More concrete:
Instead of
z = tf.Variable(tf.random_normal((batch_size, hparams.n_z)), name='z')
I have
# in mnist_vae/src/model_def.py, line 74
z = tf.Variable(np.random.normal(size=(batch_size,
hparams.n_z)).astype('float32'), name='z')
z is the variable, which is optimized via Adam optimizer with respect to an objective.
For a little bit background: There is a pre-trained neural network G, whose input z is drawn from a standard normal distribution using tf.random_normal. For a given z*, one wants to solve ẑ= argmin_z ||AG(z)-AG(z*)|| and check the reconstruction error ||G(ẑ)-G(z*)||. The outcoming minimal value c(z*)=||G(ẑ)-G(z*)|| is for several different z* quite stable around a value c1. Now, I wasn’t quite sure whether the optimization (Adam optimizer) might use the information that z comes from a standard normal distribution. So I replaced the tf.random_normal by a np.random_normal in the hope that the optimizer can’t use the information then. (see the code above)
Unfortunately, the results are indeed different using np.random.normal: c(z*)=||G(ẑ)-G(z*)|| is for several different z* stable around a different value c2 (not c1). How can one explain this? Is it really that the optimizer uses the information of the normal distribution (e.g. as loglikelihood prior) in the optimization? My feeling says no, since it's only the initialization.
The code is given in https://github.com/AshishBora/csgm

Does TensorFlow gradient compute derivative of functions with unknown dependency on decision variable

I appreciate if you can answer my questions or provide me with useful resources.
Currently, I am working on a problem that I need to do alternating optimization. So, consider we have two decision variables x and y. In the first step I take the derivative of loss function wrt. x (for fixed y) and update x. On the second step, I need to take the derivative wrt. y. The issue is x is dependent on y implicitly and finding the closed form of cost function in a way to show the dependency of x on y is not feasible, so the gradients of cost function wrt. y are unknown.
1) My first question is whether "autodiff" method in reverse mode used in TensorFlow works for these problems where we do not have an explicit form of cost function wrt to one variable and we need the derivatives? Actually, the value of cost function is known but the dependency on decision variable is unknown via math.
2) From a general view, if I define a node as a "tf.Variable" and have an arbitrary intractable function(intractable via computation by hand) of that variable that evolves through code execution, is it possible to calculate the gradients via "tf.gradients"? If yes, how can I make sure that it is implemented correctly? Can I check it using TensorBoard?
My model is too complicated but a simplified form can be considered in this way: suppose the loss function for my model is L(x). I can code L(x) as a function of "x" during the construction phase in tensorflow. However, I have also another variable "k" that is initialized to zero. The dependency of L(x) on "k" shapes as the code runs so my loss function is L(x,k), actually. And more importantly, "x" is a function of "k" implicitly. (all the optimization is done using GradientDescent). The problem is I do not have L(x,k) as a closed form function but I have the value of L(x,k) at each step. I can use "numerical" methods like FDSA/SPSA but they are not exact. I just need to make sure as you said there is a path between "k" and L(x,k)but I do not know how!
TensorFlow gradients only work when the graph connecting the x and the y when you're computing dy/dx has at least one path which contains only differentiable operations. In general if tf gives you a gradient it is correct (otherwise file a bug, but gradient bugs are rare, since the gradient for all differentiable ops is well tested and the chain rule is fairly easy to apply).
Can you be a little more specific about what your model looks like? You might also want to use eager execution if your forward complication is too weird to express as a fixed dataflow graph.

Stata output variable to matrix with ebalance

I'm using the ebalance Stata package to calculate post-stratification weights, and I'd like to convert the weights output (_webal, which is generated as a double with format %10.0g) to a matrix.
I'd like to normalize all weights in the "control" group, but I can't seem to convert the variable to a matrix in order to manipulate the weights individually (I'm a novice to Stata, so I was just going to do this using a loop––I'd normally just export and do this in R, but I have to calculate results within a bootstrap). I can, however, view the individual-level weights produced by the output, and I can use them to calculate sample statistics.
Any ideas, anyone? Thanks so much!
This is not an answer, but it doesn't fit within a comment box.
As a self-described novice in Stata, you are asking the wrong question.
Your problem is that you have a variable that you want to do some calculations on, and since you can't just use R and you don't know how to do those (unspecified) calculations directly in Stata, you have decided that the first step is to create a matrix from the variable.
Your question would be better phrased as a simple description of the relevant portions of your data and the calculation you need to do using that data (ebalance is an obscure distraction that probably lost you a few readers) and where you are stuck.
See also https://stackoverflow.com/help/mcve for a discussion of completing a minimal complete example with a description of the results you expect for that example.

adding gaussian noise to all tensorflow variables

I'm working on a project which needs to evaluate the performance of CNN/RNN after adding noise to all the variables. For example, if we have an simple MLP, I want to add a random gaussian noise to all the weight parameters, which is not difficult. However, it doesn't seem easy to manipulate the variables for RNN. For example, the variables inside the tf.contrib.rnn.BasicLSTMCell are encapsulated and not accessble for users.
I found a possible way to do this by using the tf.train.saver() function. I can print all the variables including the encapsulated variables. However, how to modify the value of all the variables is still not clear.
Is there an easy way to do this?
You can use tf.trainable_variables (doc) or tf.global_variables (doc) to get those variables, and add noisy to them.

a tricky graph solve in tensorflow

As the following, I built a graph with two big variables and two input placeholder.
Every time, I want to use the current value of variables (partial values) and input placeholders to calculate delta values. Then the delta values are update to the variables using scatter_add.
problem: the two computing paths are not the same, one needs more computing. the tensorflow solving engine seems to prefer one of the path randomly-it solves one of path, then the other. For example, tf may update variable 0 first, then use this new variable 0 to solve another path (update variable 1). This is not my need.
so, any idea?
tensorflow graph:
I find the solution. Using the tf.control_dependencies() could solve this problem.
https://www.tensorflow.org/api_docs/python/tf/control_dependencies