Complete Beta (or Gamma) function in Tensorflow - tensorflow

I've read some article about using other distribution to modeling a stochastic policy in Reinforcement Learning. Usually we use a Gaussian distribution but some used Beta distribution : https://en.wikipedia.org/wiki/Beta_distribution
There is already a Beta distribution class inside Tensorflow, allow people to use it as Tensors.
But for some policy gradient methods, they are using constraint on the optimization process, using the Kullback Leiber Divergence.
In the formula, there is the digamma function, already implemented in Tensorflow. But I can't find the beta function (nor the gamma function since they're linked) in Tensorflow. Only log gamma or incomplete gamma. And I cannot use the scipy.special.beta function because it cannot manipulate tensors (since my alpha and beta parameters are produced by a neural network)
I'm not specialist enough in this field, perhaps my question is foolish, but I'd really like an explanation there.
Thanks a lot

Related

How to integrate a pytorch model into a dynamic optimization, for example in Pyomo or gekko

Let's say I have a pytorch-model describing the evolution of some multidimensional system based on its own state x and an external actuator u. So x_(t+1) = f(x_t, u_t) with f being the artificial neural network from pytorch.
Now i want to solve a dynamic optimization problem to find an optimal sequence of u-values to minimize an objective that depends on x. Something like this:
min sum over all timesteps phi(x_t)
s.t.: x_(t+1) = f(x_t, u_t)
Additionally I also have some upper and lower bounds on some of the variables in x.
Is there an easy way to do this using a dynamic optimization toolbox like pyomo or gekko?
I already wrote some code that transforms a feedforward neural network to a numpy-function which can then be passed as a constraint to pyomo. The problem with this approach is, that it requires significant reprogramming-effort every time the structure of the neural network changes, so quick testing becomes difficult. Also integration of recurrent neural networks gets difficult because hidden cell states would have to be added as additional variables to the optimization problem.
I think a good solution could be to do the function evaluations and gradient calculations in torch and somehow pass the results to the dynamic optimizer. I'm just not sure how to do this.
Thanks a lot for your help!
Tensorflow or Pytorch models can't be directly integrated into the GEKKO at this moment. But, I believe you can retrieve the derivatives from Tensorflow and Pytorch, which allows you to pass them to the GEKKO.
There is a GEKKO Brain module and examples in the link below. You can also find an example that uses GEKKO Feedforward neural network for dynamic optimization.
GEKKO Brain Feedforward neural network examples
MIMO MPC example with GEKKO neural network model
Recurrent Neural Network library in the GEKKO Brain module is currently being developed, which allows using all the GEKKO's dynamic optimization functions easily.
In the meantime, you can use a sequential method by wrapping the TensorFlow or PyTorch models in the available optimization solver such as scipy optimization module.
Check out the below link for a dynamic optimization example with Keras LSTM model and scipy optimize.
Keras LSTM MPC

Is there a worked example for neural network pruning for the Faster-RCNN architecture from TensorFlow's object detection api?

I am trying to find a worked example of neural network pruning for the Faster-RCNN architecture.
My core stack is Tensorflow 1.12, its object_detection API (link) on Python3.5.2 in Ubuntu 16.04 LTS. I came across some Neural Network Pruning repos (e.g. link, implementing NVIDIA's pruning paper with Taylor expansion link - looking the most promising however (a) implemented in Pytorch and (b) on classification networks rather than detectors).
I am also aware of the existence of a pruning functionality within TensorFlow under this package (link), but could only run an example found in the comments of the following StackOverflow question (link) to train and prune (not thoroughly tested) a simple Neural Network for hand written digits classification using MNIST dataset.
I am looking for a worked example and not reporting any bugs or issues in code.
Can someone point to me a worked example of pruning Faster-RCNN -or other detectors- found on the TensorFlow's object detection API (link), preferably using TensorFlow's pruning package (link)?
Pruning is orthogonal to the meta-architecture used for object detection. When we talk about the TensorFlow Object Detection API, it heavily relies on builders that read the config and create corresponding nets, classes etc. I believe you want to prune the feature extractor as the most heavy part. If so, you need to first prune some feature extractor from slim (let's say, Inception-V2), give it a name, add its pruned version to models, adjust proto config and many more. Shortly speaking, you need to introduce a new type of feature extractor. But I am not aware of any existing examples on that.

How to choose the threshold of the output of a dnn in tensorflow?

I am currently learning to make neural networks with tensorflow. And the library provides a very convenient way to create one with the estimator DNNClassifier like in this tutorial: https://www.tensorflow.org/get_started/premade_estimators.
However, I don't manage to see how to choose the final treshold of the output layer before making the prediction:
For instance, let's say we have a binary classifier between 'KO' and 'OK'. The end of the neural network compute the probabilities for each possibility for a specific sample, for instance [0.4,0.6] (so 40% that the answer is 'KO' and 60% that the answer is 'OK'). I assume that the dnn takes by default a threshold of 0.5, so it will answer 'OK' here. But I want to change this threshold to 0.8 so that if the dnn is not sure at 80% for 'OK', it will answer 'KO' (in order to tune the FP-rate and the FN-rate).
How can we do that ?
Thanks in advance for your help.
The premade estimators are somewhat rigid. The DNNClassifier, for example, does not provide a mechanism to change the loss function or to obtain the logits/probabilities output by the classifier, as you've discovered.
To modify the logic of how predictions are generated, or to modify your loss function, you'll have to create a custom Estimator. This tutorial walks you through that process.
If you haven't invested too much time learning how to use the Estimator API yet, I recommend you also acquaint yourself with Keras, another high-level API for building and training deep learning models in TensorFlow; you might find it easier to build custom models with Keras rather than Estimators.

The difference between tf.layers, tf.contrib, and tf.nn in Tensorflow [duplicate]

In tensorflow 1.4, I found two functions that do batch normalization and they look same:
tf.layers.batch_normalization (link)
tf.contrib.layers.batch_norm (link)
Which function should I use? Which one is more stable?
Just to add to the list, there're several more ways to do batch-norm in tensorflow:
tf.nn.batch_normalization is a low-level op. The caller is responsible to handle mean and variance tensors themselves.
tf.nn.fused_batch_norm is another low-level op, similar to the previous one. The difference is that it's optimized for 4D input tensors, which is the usual case in convolutional neural networks. tf.nn.batch_normalization accepts tensors of any rank greater than 1.
tf.layers.batch_normalization is a high-level wrapper over the previous ops. The biggest difference is that it takes care of creating and managing the running mean and variance tensors, and calls a fast fused op when possible. Usually, this should be the default choice for you.
tf.contrib.layers.batch_norm is the early implementation of batch norm, before it's graduated to the core API (i.e., tf.layers). The use of it is not recommended because it may be dropped in the future releases.
tf.nn.batch_norm_with_global_normalization is another deprecated op. Currently, delegates the call to tf.nn.batch_normalization, but likely to be dropped in the future.
Finally, there's also Keras layer keras.layers.BatchNormalization, which in case of tensorflow backend invokes tf.nn.batch_normalization.
As show in doc, tf.contrib is a contribution module containing volatile or experimental code. When function is complete, it will be removed from this module. Now there are two, in order to be compatible with the historical version.
So, the former tf.layers.batch_normalization is recommended.

skipping layer in backpropagation in keras

I am using Keras with tensorflow backend and I am curious whether it is possible to skip a layer during backpropagation but have it execute in the forward pass. So here is what I mean
Lambda (lambda x: a(x))
I want to apply a to x in the forward pass but I do not want a to be included in the derivation when the backprop takes place.
I was trying to find a solution bit I could not find anything. Can somebody help me out here?
UPDATE 2
In addition to tf.py_func, there is now an official guide on how to add a custom op.
UPDATE
See this question for an example of writing a custom op with gradient purely in Python without needing to rebuild anything. Note that there are some limitations to the method (see the documentation of tf.py_func).
Not exactly a solution to the problem, but still kind of an answer and too long for comments.
That's not even a Keras issue, but a TensorFlow one. Each op defines its own gradient computation that is used during backpropagation. I you really wanted to something like that, you would need to implement the op into TensorFlow yourself (no easy feat) and define the gradient that you want - because you can't have "no gradient", if anything it would be 1 or 0 (otherwise you can't go on with backpropagation). There is a tf.NoGradient function in TensorFlow which causes an op to propagate zeros, but I don't think it is meant to / can be used out of TensorFlow own internals.
UPDATE
Okay so a bit more of context. TensorFlow graphs are built of ops, which are implemented by kernels; this is basically a 1-to-1 mapping, except that there may be for example a CPU and a GPU kernel for an op, hence the differentiation. The set of ops supported by TensorFlow is usually static, I mean it can change with newer versions, but in principle you cannot add your own ops, because the ops of a graph go into the Protobuf serialized format, so if you made your own ops then you would not be able to share your graph. Ops are then defined at C++ level with the macro REGISTER_OP (see for example here), and kernels with REGISTER_KERNEL_BUILDER (see for example here).
Now, where do gradients come into play? Well, the funny thing is that the gradient of an op is not defined at C++ level; there are ops (and kernels) that implement the gradient of other ops (if you look at the previous files you'll find ops/kernels with the name ending in Grad), but (as far as I'm aware) these are not explicitly "linked" at this level. It seems that the associations between ops and their gradients is defined in Python, usually via tf.RegisterGradient or the aforementioned tf.NoGradient (see for example here, Python modules starting with gen_ are autogenerated with the help of the C++ macros); these registrations inform the backpropagation algorithm about how to compute the gradient of the graph.
So, how to actually work this out? Well, you need to create at least one op in C++ with the corresponding kernel/s implementing the computation that you want for your forward pass. Then, if the gradient computation that you want to use can be expressed with existing TensorFlow ops (which is most likely), you would just need to call tf.RegisterGradient in Python and do the computation there in "standard" TensorFlow. This is quite complicated, but the good news is it's possible, and there's even an example for it (although I think they kinda forgot the gradient registration part in that one)! As you will see, the process involves compiling the new op code into a library (btw I'm not sure if any of this may work on Windows) that is then loaded from Python (obviously this involves going through the painful process of manual compilation of TensorFlow with Bazel). A possibly more realistic example can be found in TensorFlow Fold, an extension of TensorFlow for structured data that register (as of one) one custom operation here through a macro defined here that calls REGISTER_OP, and then in Python it loads the library and register its gradient here through their own registration function defined here that simply calls tf.NotDifferentiable (another name for tf.NoGradient)
tldr: It is rather hard, but it can be done and there are even a couple of examples out there.
As mentioned in #jdehesa's comments. You can implement your function with an "alternative gradient". Forgive me if my math is not correct, but I think a derivative returning "1" would be the correct way to have no effect on the backpropagation while still passing the learning through. For how to construct it, see here. The example I cited goes further and allows you to construct an activation function from a python function. So in place of the spiky function, substitute your function a, and in place of his derivative d_spiky replace it with
def constant(x):
return 1
So on the forward pass, a is applied in the layer and the the backwards pass 1 is applied which should simply pass the weight adjustments through.
You can then just create an Activation layer in Keras using this function.