I am implementing element-wise operations in TensorFlow. Many TensorFlow operations, e.g. add, support broadcasting (from numpy). Broadcasting is possible if the following rule is respected:
When operating on two tensors, their shapes should be compared element-wise. The procedure starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when they are equal, or one of them is 1. If these conditions are not met, an exception is thrown, indicating that the tensors have incompatible shapes. The size of the resulting tensor is the maximum size along each dimension of the input arrays.
Does TensorFlow C++ API provide any method for comparing the compatibility of two tensors? Or, which is the fastest way to do that?
All element-wise binary operations' kernel implementations in TensorFlow derive from BinaryOpShared class, that does the compatibility checking via the helper class BinaryOpState. Perhaps, you can simply derive your kernel class from BinaryOpShared and get the compatibility checking for free.
Related
I am trying to implement a custom convolution operation in tensorflow with c++ and cuda, and I found that the back-propagation for the Conv2D in tensorflow are implemented via two separate operations. Indeed, I found there are two operation implementations, namely conv_grad_filter_ops.cc and conv_grad_input_ops.cc in the tensorflow source code, which means the gradients for filter and input are calculated respectively. May I ask what is the idea behind this implementation? Why they were not simply merged together as one single operation?
Alright, I did a test and found that there's about 30% speed boost if the back propagation for different inputs are split into different TF ops compared with wrapped into one single TF op. This is against intuition, perhaps there's something related with TF's architecture. Note: my test was based on CUDA im2col/col2im with CuBLAS instead of CuDNN.
In tensorflow 1.4, I found two functions that do batch normalization and they look same:
tf.layers.batch_normalization (link)
tf.contrib.layers.batch_norm (link)
Which function should I use? Which one is more stable?
Just to add to the list, there're several more ways to do batch-norm in tensorflow:
tf.nn.batch_normalization is a low-level op. The caller is responsible to handle mean and variance tensors themselves.
tf.nn.fused_batch_norm is another low-level op, similar to the previous one. The difference is that it's optimized for 4D input tensors, which is the usual case in convolutional neural networks. tf.nn.batch_normalization accepts tensors of any rank greater than 1.
tf.layers.batch_normalization is a high-level wrapper over the previous ops. The biggest difference is that it takes care of creating and managing the running mean and variance tensors, and calls a fast fused op when possible. Usually, this should be the default choice for you.
tf.contrib.layers.batch_norm is the early implementation of batch norm, before it's graduated to the core API (i.e., tf.layers). The use of it is not recommended because it may be dropped in the future releases.
tf.nn.batch_norm_with_global_normalization is another deprecated op. Currently, delegates the call to tf.nn.batch_normalization, but likely to be dropped in the future.
Finally, there's also Keras layer keras.layers.BatchNormalization, which in case of tensorflow backend invokes tf.nn.batch_normalization.
As show in doc, tf.contrib is a contribution module containing volatile or experimental code. When function is complete, it will be removed from this module. Now there are two, in order to be compatible with the historical version.
So, the former tf.layers.batch_normalization is recommended.
If I am defining a custom Op in Tensorflow, is it possible to provide two Kernels for the top that are polymorphic on whether the shape for the inputs are fully defined? For example, I can construct certain structures once at Kernel construction if the shape is fully known / defined.
It's not currently possible to do this. The kernel dispatch mechanism is implemented in a low-level part of the TensorFlow code where information about tensor shapes is not (generally) available.
However, the ability to specialize a graph based on known shapes does seem like a useful ability, and it might be worth raising this as a feature request on the GitHub issues page. One possible workaround would be to try registering an optimization pass that makes use of shape information and rewrites the names of ops with known input shapes to a different op that relies on static shape information (e.g. via an additional attr). However, doing this in TensorFlow currently requires you to rebuild from source.
I've read the XLA prerelease document here.
https://www.tensorflow.org/versions/master/resources/xla_prerelease#xla_accelerated_linear_algebra
It discusses datatypes of elements, but does not go into much detail about the data organization of the tensors themselves. How will operations on SparseTensor objects be handled once XLA is available?
The layouts restrict the data organization of input and output tensors and don't include sparse layouts, although as Jingyue suggests, they could be extended in the future. The internal representation of tensors in the AST can in principle be anything a backend wants, and it is expected that the compiler may reorganize the data to different layouts for the convenience of different operators implemented by different backends.
I am not aware that anyone has put much thought into how to do this efficiently for sparse tensors. In principle maybe it could be done as a compiler pass to infer sparsity and propagate it, with sparse implementations for all the relevant operators. Nothing like that exists today.
No, XLA focuses on dense tensors and doesn't deal with sparse tensors in an efficient way today.
It could be easily extended to allow users to express some sparsity using layouts (e.g. interior padding).
Sparse data is something we'd like to have working, though it has some challenges. E.g. currently XLA depends on knowing the exact size of every buffer statically. We could certainly find a way to deal with that, but have been focusing on dense data so far.
A few years later, XLA seems to have some support for sparse tensors, and working well at that. My workflow involves sparse tensors for very high dimensional data that would be prohibitive to keep in memory, then slicing and manipulating and finally performing math ops on a lower dimensional sense tensor. For slicing sparse tensors I’m getting a roughly 4x speed up with xla
I was looking through the API in TensorFlow and notice that a lot of mathematical operations that already exist in python and numpy have been re-implemented (or at least given a tensorflow interface). For example:
is there a good reason to do this?
I've been searching over their page but can't find why they'd do this.
I do have some guesses though. One of my main guesses is that they probably want those operations to have some backpropagation effect on whatever Neural network graph that gets implementat. In other words, have their derivatives implemented. Is this one of the reasons? (wish I knew how to even check if my guess is right)
For example, in one of the most basic examples of linear regression, one defines the prediction function that one wants to implement:
product = tf.matmul(x,W)
y = product + b
instead of
product = tf.matmul(x,W)
y = tf.add(product, b)
Somehow the first implementation does not interfere with Stochastic Gradient Descent algorithm for training, so it probably doesn't matter if one uses numpy or tf.add to train? This is one aspect that confuses me, when do I know which one should I be using.
Or maybe they are performance reasons? Or maybe its to give those operations access to GPU if required to use GPUs?
You have to understand that you create a tensorflow graph with this operation, meaning they aren't the same as the numpy functions, they are more an abstraction of them.
Maybe you have noticed that you have to create a session and then evaluate the functions through that session to get a result, where with numpy functions they are executed directly. this is because this graph and its functions define what to do like writing down a formula, but to get results for a specific x (or whatever) you have to insert a value for x. This is what your doing through session and eval.
So to conclude this you define a graph with tensorflow which is a more abstract representation of the functions and the graph also isn't executed at runtime, then it is defined, it will be executed when you call the eval function and through that run the session.
Also notice that you cant mix numpy functions and tensorflow functions directly but you can define own tensorflow functions (https://www.tensorflow.org/versions/r0.9/how_tos/adding_an_op/index.html)
Btw I guess most of the tensorflow functions are using numpy under the hood. :)