As mentioned in the onnxruntime documentation:
Out of the box, ONNXRuntime applies a series of optimizations to the ONNX graph, combining nodes where possible and factoring out constant values (constant folding).
My question is:
Is the exported ONNX computational graph a static graph or a dynamic one?
Related
In tensorboard, it is possible to plot the computational graph of a deep learning model.
Is it possible to display a value for each node (for example, the norm of the output)?
Is it possible to do it in both pytorch and tensorflow?
Example (display computational graph with torch.norm of output of each computational graph's node in vgg11):
import torch
import torchvision
vgg11 = torchvision.models.vgg11(pretrained=True)
image = torch.randn(8, 3, 224, 224)
out = vgg11(image)
So in the output node, we want the value in the computational graph to be
torch.norm(out)
One issue in pytorch side is that, there is no explicit computational graph to visualize (e.g. in pydot).
This is a poor question, but:
Yes is is possible.
Access the underlying internal members via the _ or __ prefix and calculate the norm...
So yes it is possible, but the same code will not work across frameworks.
One of the major problems I've encountered when converting PyTorch models to TensorFlow through ONNX, is slowness, which appears to be related to the input shape, even though I was able to get bit-exact outputs with the two frameworks.
While the PyTorch input shape is B,C,H,W, the Tensorflow input shape is B,H,W,C, where B,C,H,W stand for batch size, channels, height and width, respectively. Technically, I solve the input shape problem easily when working in Tensorflow, using two calls to np.swapaxes:
# Single image, no batch size here yet
image = np.swapaxes(image, 0, 2) # Swapping C and H dimensions - result: C,W,H
image = np.swapaxes(image, 1, 2) # Swapping H and W dimensions - result: C,H,W (like Pytorch)
The slowness problem seems to be related to the differences in the ways the convolutional operations are implemented in PyTorch vs Tensorflow. While PyTorch expects channels first, Tensorflow expects channels last.
As a result, when I visualize the models using Netron, the ONNX model looks abstract and making sense (first image), whereas the Tensorflow .pb formatted model looks like a big mess (second image).
Note: It appears that this problem has already concerned the writers of the onnx2keras library, which supports an experimental feature of changing the C,H,W ordering originated in Pytorch, into H,W,C.
Any idea how to overcome this limitation? Are there other options for more abstractly exporting PyTorch models into Tensorflow?
ONNX (from PyTorch) - you can see the straight flow and the residual blocks:
Tensorflow (imported from the ONNX model) - almost nothing looks like a series of predefined operations:
I am currently using an existing Keras implementation of a certain model and I would like to study the effects of different multiplication implementations on its computational speed and accuracy.
Is there a simple way to replace the Keras (TensorFlow) multiplication that is used in its Dense and Conv (and other pre-existing) layers with a custom one?
The idea is also to see the difference between training with normal multiplication + testing with custom multiplication and doing both with the custom multiplication.
So I'm looking for a solution that's something like:
import tensorflow as tf
tf.__mul__ = custom_mult
and will replace all multiplication operations in Keras's default layers with my own implementation.
I want to add a constraint option in my loss function. The definition of this constraint option needs numpy array type as input. So, I can not define it as a tensor type as a graph node in tensorflow. How can I define this part in graph so as to join in the network optimization?
Operations done on numpy arrays cannot be automatically differentiated in TensorFlow. Since you are using this computation as part of loss computation, I assume you want to differentiate it. In this case, your best option is probably to reimplement the constraint in TensorFlow. The only other approach I can think of is to use autograd in conjuction with TF. This seems possible - something along the lines of evaluate part of the graph with TF, get numpy arrays out, call your function under autograd, get gradients, feed them back into TF - but will likely be harder and slower.
If you are reimplementing it in TF, most numpy operations have easy one-to-one corresponded operations in TF. If the implementation is using a lot of control flow (which can be painful in classic TF), you can use eager execution or py_func.
I have a simple question and I was also searching already quiet a bit, but maybe I'm using the wrong keywords.
How does Tensorflow handle a given graph? If one has the simple graph:
x = tf.constant(1.0, name='input')
w = tf.constant0.8, name='weight')
b = tf.constant0.8, name='bias')
y_1 = tf.mul(w, x, name='output_1')
y_2 = tf.add(y_1, b, name='output_1')
The arithmetic statement is of course given by the computational graph, but is Tensorflow then kind of compiling and simplifying it in terms of saving time by not copying memories, etc.? So that it a 'condensed' version of the computational kernel is executed on the 'device' like CPU or GPU?
So that it reduces to something like that:
y_2 = tf.add(tf.mul(w, x), b, name='output_1')
Maybe somebody knows a good resource to learn more about how exactly Tensorflow runs under the hood without looking too deep into the source-code.
Thank you very much in advance!
TensorFlow includes various optimizations that can have the effect of simplifying a dataflow graph. In particular:
TensorFlow will apply common subexpression elimination to avoid performing redundant computation. In the case of your example, this will not have much effect, but TensorFlow will observe that w and b are the same constant, and replace them with a single value.
TensorFlow will apply constant propagation so that (computed) values that are the same in every execution of a subgraph will only be computed once. In your example, the entire expression is a constant, so TensorFlow will replace it with a single tf.constant() value corresponding to the result (1.6).
If you use the experimental XLA compiler, TensorFlow will make more aggressive simplifications, and may be able to replace a subgraph with a single TensorFlow kernel, containing just-in-time compiled code. If in your example x were a tf.placeholder(), the remainder of the computation could be compiled into a single kernel with one input and one output.