Elementwise Sampling with map_fn Slow - numpy

Say that I want to sample a matrix with each entry sampled from a distribution defined by an entry in another matrix. I unroll my matrix and apply map_fn to each element. With a relatively small matrix (128 x 128), the following gives me several PoolAllocator warnings (GTX TITAN Black) and does not train in any reasonable amount of time.
def sample(x):
samples = tf.map_fn(lambda z:
tf.random_normal([1], mean=z,
stddev=tf.sqrt(z * (1 - z))),
tf.reshape(x, [-1])) # apply to each element
return tf.cond(is_training, lambda: tf.reshape(samples, shape=tf.shape(x)),
lambda: tf.tanh(x))
Is there a better way to apply an elementwise operation like this?

Your code will run much faster if you can use Tensor-at-a-time operations instead of elementwise operations like tf.map_fn.
Here it looks like you want to sample from a normal distribution for each element, where the parameters of the distribution are different for each value in an input tensor. Try something like this:
def sample(x):
samples = tf.random_normal(shape=[128, 128]) * tf.sqrt(x * (1 - x)) + x
tf.random_normal() generates a normal distribution with mean 0.0 and standard deviation 1.0 by default. You can use point-wise tensor operations to fix up the standard deviation (by multiplying) and the mean (by adding) for each element. In fact, if you look at how tf.random_normal() is implemented, that's precisely what it does internally.
(You would probably also do better using a Python conditional to distinguish training from test time.)
If you plan to do this sort of thing a lot, you might file a feature request on github asking to generalize tf.random_normal to accept Tensors with more general shapes for mean and stddev. I see no reason why that shouldn't be supported.
Hope that helps!

See the tensorflow.contrib.distributions module, which has a Normal class with a sample method that does this for you.

Related

Complex convolution in tensorflow

I'm trying to run a simple convolution but with complex numbers:
r = np.random.random([1,10,10,10])
i = np.random.random([1,10,10,10])
x = tf.complex(r,i)
conv_layer = tf.layers.conv2d(
inputs=x,
filters=10,
kernel_size=[3,3],
kernel_initializer=utils.truncated_normal_complex(),
activation=tf.nn.sigmoid)
However I get this error:
TypeError: Value passed to parameter 'input' has DataType complex128 not in list of allowed values: float16, float32
Does anyone know how to implement such a convolution in Tensorflow?
Will I need to implement a custom op, or is there some better option here?
Frustratingly, complex matrix multiplication is possible, e.g. the following runs fine:
def r():
return np.random.random([10,10])
A = tf.complex(r(),r())
B = tf.complex(r(),r())
C = tf.multiply(A,B)
sess.run(C)
So there's no real reason convolution shouldn't work, I would think (as convolution is essentially just matrix multiplication).
Thanks
Probably too late but for anyone who still is interested: applying convolutions to complex valued data is not as straightforward as your usual data types, like float32. There are studies that investigat different network structures for this purpose (for example see this link for "Deep Complex U-Net"). There are implementations of these structures in pytorch and tensorflow.
All complex-valued features are split into either Cartesian (real, imaginary) or polar (modulus, angle) representations. Nobody is really trying to use a single feature that is purely complex; I would love to be proven wrong!

taking the gradient in Tensorflow, tf.gradient

I am using this function of tensorflow to get my function jacobian. Came across two problems:
The tensorflow documentation is contradicted to itself in the following two paragraph if I am not mistaken:
gradients() adds ops to the graph to output the partial derivatives of ys with respect to xs. It returns a list of Tensor of length len(xs) where each tensor is the sum(dy/dx) for y in ys.
Blockquote
Blockquote
Returns:
A list of sum(dy/dx) for each x in xs.
Blockquote
According to my test, it is, in fact, return a vector of len(ys) which is the sum(dy/dx) for each x in xs.
I do not understand why they designed it in a way that the return is the sum of the columns(or row, depending on how you define your Jacobian).
How can I really get the Jacobian?
4.In the loss, I need the partial derivative of my function with respect to input (x), but when I am optimizing with respect to the network weights, I define x as a placeholder whose value is fed later, and weights are variable, in this case, can I still define the symbolic derivative of function with respect to input (x)? and put it in the loss? ( which later when we optimize with respect to weights will bring second order derivative of the function.)
I think you are right and there is a typo there, it was probably meant to be "of length len(ys)".
For efficiency. I can't explain exactly the reasoning, but this seems to be a pretty fundamental characteristic of how TensorFlow handles automatic differentiation. See issue #675.
There is no straightforward way to get the Jacobian matrix in TensorFlow. Take a look at this answer and again issue #675. Basically, you need one call to tf.gradients per column/row.
Yes, of course. You can compute whatever gradients you want, there is no real difference between a placeholder and any other operation really. There are a few operations that do not have a gradient because it is not well defined or not implemented (in which case it will generally return 0), but that's all.

How to use StreamingDataFeeder as contrib.learn.Estimator.fit()'s input_fn?

I have recently started using tensorflow.contrib.learn (skflow) library and really like it. However, I am facing an issue with using Estimator, the fit function uses either
(X, Y, and batch_size) - the problem with this approach is that it does not support provision for specifying number of epochs and allowing arbitrary source of data.
input_fn - besides, setting epochs, it gives me much more flexibility on source of training ( which in my case is coming directly from a database).
Now I am aware that I could create input_fn which reads files, however, as I am not interested in dealing with files, the following functions are not useful for me -
tf.contrib.learn.read_batch_examples
tf.contrib.learn.read_batch_features
tf.contrib.learn.read_batch_record_features
Ideally, I would like to use StreamingDataFeeder as input_fn. Any ideas how I can achieve this?
StreamingDataFeeder is used when you provide iterators as x / y to fit/predict/evaluate of Estimator.
Example:
x = (np.array([i]) for i in xrange(10**10)) # use range for python >=3.0
y = (np.array([i + 1]) for i in xrange(10**10))
lr = tf.contrib.learn.LinearRegressor(
feature_columns=[tf.contrib.layers.real_valued_column('')])
# only consumes 1000*10 values from iterators.
lr.fit(x, y, steps=1000, batch_size=10)
If you want to use input_fn for feeding data - you need to use graph operations to read / process data. For example you can create a C++ operation that will produce your data (it can be listening port or reading from database Op) and convert into Tensor. Mainly this is good for reading data from files, but other readers can be implemented as well.

What is the best way to implement weight constraints in TensorFlow?

Suppose we have weights
x = tf.Variable(np.random.random((5,10)))
cost = ...
And we use the GD optimizer:
upds = tf.train.GradientDescentOptimizer(lr).minimize(cost)
session.run(upds)
How can we implement for example non-negativity on weights?
I tried clipping them:
upds = tf.train.GradientDescentOptimizer(lr).minimize(cost)
session.run(upds)
session.run(tf.assign(x, tf.clip_by_value(x, 0, np.infty)))
But this slows down my training by a factor of 50.
Does anybody know a good way to implement such constraints on the weights in TensorFlow?
P.S.: in the equivalent Theano algorithm, I had
T.clip(x, 0, np.infty)
and it ran smoothly.
You can take the Lagrangian approach and simply add a penalty for features of the variable you don't want.
e.g. To encourage theta to be non-negative, you could add the following to the optimizer's objective function.
added_loss = -tf.minimum( tf.reduce_min(theta),0)
If any theta are negative, then add2loss will be positive, otherwise zero. Scaling that to a meaningful value is left as an exercise to the reader. Scaling too little will not exert enough pressure. Too much may make things unstable.
As of TensorFlow 1.4, there is a new argument to tf.get_variable that allows to pass a constraint function that is applied after the update of the optimizer. Here is an example that enforces a non-negativity constraint:
with tf.variable_scope("MyScope"):
v1 = tf.get_variable("v1", …, constraint=lambda x: tf.clip_by_value(x, 0, np.infty))
constraint: An optional projection function to be applied to the
variable
after being updated by an Optimizer (e.g. used to implement norm
constraints or value constraints for layer weights). The function must
take as input the unprojected Tensor representing the value of the
variable and return the Tensor for the projected value
(which must have the same shape). Constraints are not safe to
use when doing asynchronous distributed training.
By running
sess.run(tf.assign(x, tf.clip_by_value(x, 0, np.infty)))
you are consistently adding nodes to the graph and making it slower and slower.
Actually you may just define a clip_op when building the graph and run it each time after updating the weights:
# build the graph
x = tf.Variable(np.random.random((5,10)))
loss = ...
train_op = tf.train.GradientDescentOptimizer(lr).minimize(loss)
clip_op = tf.assign(x, tf.clip(x, 0, np.infty))
# train
sess.run(train_op)
sess.run(clip_op)
I recently had this problem as well. I discovered that you can import keras which has nice weight constraint functions as use them directly in the kernen constraint in tensorflow. Here is an example of my code. You can do similar things with kernel regularizer
from keras.constraints import non_neg
conv1 = tf.layers.conv2d(
inputs=features['x'],
filters=32,
kernel_size=[5,5],
strides = 2,
padding='valid',
activation=tf.nn.relu,
kernel_regularizer=None,
kernel_constraint=non_neg(),
use_bias=False)
There is a practical solution: Your cost function can be written by you, to put high cost onto negative weights. I did this in a matrix factorization model in TensorFlow with python, and it worked well enough. Right? I mean it's obvious. But nobody else mentioned it so here you go. EDIT: I just saw that Mark Borderding also gave another loss and cost-based solution implementation before I did.
And if "the best way" is wanted, as the OP asked, what then? Well "best" might actually be application-specific, in which case you'd need to try a few different ways with your dataset and consider your application requirements.
Here is working code for increasing the cost for unwanted negative solution variables:
cost = tf.reduce_sum(keep_loss) + Lambda * reg # Cost = sum of losses for training set, except missing data.
if prefer_nonneg: # Optionally increase cost for negative values in rhat, if you want that.
negs_indices = tf.where(rhat < tf.constant(0.0))
neg_vals = tf.gather_nd(rhat, negs_indices)
cost += 2. * tf.reduce_sum(tf.abs(neg_vals)) # 2 is a magic number (empirical parameter)
You are free to use my code but please give me some credit if you choose to use it. Give a link to this answer on stackoverflow.com please.
This design would be considered a soft constraint, because you can still get negative weights, if you let it, depending on your cost definition.
It seems that constraint= is also available in TF v1.4+ as a parameter to tf.get_variable(), where you can pass a function like tf.clip_by_value. This seems like another soft constraint, not hard constraint, in my opinion, because it depends on your function to work well or not. It also might be slow, as the other answerer tried the same function and reported it was slow to converge, although they didn't use the constraint= parameter to do this. I don't see any reason why one would be any faster than the other since they both use the same clipping approach. So if you use the constraint= parameter then you should expect slow convergence in the context of the original poster's application.
It would be nicer if also TF provided true hard constraints to the API, and let TF figure out how to both implement that as well as make it efficient on the back end. I mean, I have seen this done in linear programming solvers already for a long time. The application declares a constraint, and the back end makes it happen.

Error when computing eigenvalues of a scipy LinearOperator: "gmres did not converge"

I'm trying to solve a large eigenvalue problem with Scipy where the matrix A is dense but I can compute its action on a vector without having to assemble A explicitly. So in order to avoid memory issues when the matrix A gets big I'd like to use the sparse solver scipy.sparse.linalg.eigs with a LinearOperator that implemements this action.
Applying eigs to an explicit numpy array A works fine. However, if I apply eigs to a LinearOperator instead then the iterative solver fails to converge. This is true even if the matvec method of the LinearOperator is simply matrix-vector multiplication with the given matrix A.
A minimal example illustrating the failure is attached below (I'm using shift-invert mode because I am interested in the smallest few eigenvalues). This computes the eigenvalues of a random matrix A just fine, but fails when applied to a LinearOperator that is directly converted from A. I tried to fiddle with the parameters for the iterative solver (v0, ncv, maxiter) but to no avail.
Am I missing something obvious? Is there a way to make this work? Any suggestions would be highly appreciated. Many thanks!
Edit: I should clarify what I mean by "make this work" (thanks, Dietrich). The example below uses a random matrix for illustration. However, in my application I know that the eigenvalues are almost purely imaginary (or almost purely real if I multiply the matrix by 1j). I'm interested in the 10-20 smallest-magnitude eigenvalues, but the algorithm doesn't behave well (i.e., never stops even for small-ish matrix sizes) if I specify which='SM'. Therefore I'm using shift-invert mode by passing the parameters sigma=0.0, which='LM'. I'm happy to try a different approach so long as it allows me to compute a bunch of smallest-magnitude eigenvalues.
from scipy.sparse.linalg import eigs, LinearOperator, aslinearoperator
import numpy as np
# Set a seed for reproducibility
np.random.seed(0)
# Size of the matrix
N = 100
# Generate a random matrix of size N x N
# and compute its eigenvalues
A = np.random.random_sample((N, N))
eigvals = eigs(A, sigma=0.0, which='LM', return_eigenvectors=False)
print eigvals
# Convert the matrix to a LinearOperator
A_op = aslinearoperator(A)
# Try to solve the same eigenproblem again.
# This time it produces an error:
#
# ValueError: Error in inverting M: function gmres did not converge (info = 1000).
eigvals2 = eigs(A_op, sigma=0.0, which='LM', return_eigenvectors=False)
I tried running your code, but not passing the sigma parameter to eigs() and it ran without problems (read eigs() docs for its meaning). I didn't see the benefit of it in your example.
Eigs can already find the smallest eigenvalues first. Set which = 'SM'