https://software.intel.com/en-us/forums/computer-vision/topic/785538
"The problem has been resolved. It's because the model I use uses channels_first as default for GPU training, while OPENVINO requires channels_last for TF models."
What do these mean?
How can I change them?
I cannot find any further references to this on the net.
Channels first means that in a specific tensor (consider a photo), you would have (Number_Of_Channels, Height , Width).
Channels last means channels are on the last position in a tensor(n-dimensional array).
Examples:
(3,360,720) --- Channels first
(360,720,3) --- Channels last
where 3 comes from RGB (coloured image).
TensorFlow has by default channels last setting in the configuration.
The issue comes from the fact that some obsolete now frameworks(such as Theano) had a channels-first approach; porting was a problem particularly for newbies.
The solution to your problem would be to re-train your model in "Channels_Last" format.
You can convert TF model with NCHW layout to IR by using --disable_nhwc_to_nchw with Model Optimizer.
NCHW - channel first
NHWC - channel last
N:batch_size, C:no.of.channels, H:input_img_height, W:input_img_width
by default MKLDNN-plugin uses NCHW data layout.
Related
I'm trying to use TensorFlow 2 Object Detection API with a custom dataset for multi classes to train an SSD, I took as base the example provide by the documentation: https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/eager_few_shot_od_training_tf2_colab.ipynb
My current problem is when I start the fine tuning:
InvalidArgumentError: The first dimension of paddings must be the rank
of inputs[2,2] [6] [Op:Pad]
That seems to be related with the section of model.provide_groundtruth on train_step_fn, as I mention I took my data from a TensorFlow record, I mapped this to a dataset and divide it into batches using padded_batches(tf.data.TFRecordDataset) seems that this is the correct to feed the training with the image but now my problem is the groundtruth because this now is also converted to batches [batch_size,num_detections,coordinate_bbox], is this the problem? any idea on how to fix this issue.
Thanks
P.S. I tried to used the version of modified the pipeline.config file and run the model_main_tf2.py as was in the past with TensorFlow 1 but this method is buggy.
Just to share with everyone this resolves my issue was that I manage to split the data into batches the images and ground truth correctly but I never convert my labels to one hot vector encoding.
One of the major problems I've encountered when converting PyTorch models to TensorFlow through ONNX, is slowness, which appears to be related to the input shape, even though I was able to get bit-exact outputs with the two frameworks.
While the PyTorch input shape is B,C,H,W, the Tensorflow input shape is B,H,W,C, where B,C,H,W stand for batch size, channels, height and width, respectively. Technically, I solve the input shape problem easily when working in Tensorflow, using two calls to np.swapaxes:
# Single image, no batch size here yet
image = np.swapaxes(image, 0, 2) # Swapping C and H dimensions - result: C,W,H
image = np.swapaxes(image, 1, 2) # Swapping H and W dimensions - result: C,H,W (like Pytorch)
The slowness problem seems to be related to the differences in the ways the convolutional operations are implemented in PyTorch vs Tensorflow. While PyTorch expects channels first, Tensorflow expects channels last.
As a result, when I visualize the models using Netron, the ONNX model looks abstract and making sense (first image), whereas the Tensorflow .pb formatted model looks like a big mess (second image).
Note: It appears that this problem has already concerned the writers of the onnx2keras library, which supports an experimental feature of changing the C,H,W ordering originated in Pytorch, into H,W,C.
Any idea how to overcome this limitation? Are there other options for more abstractly exporting PyTorch models into Tensorflow?
ONNX (from PyTorch) - you can see the straight flow and the residual blocks:
Tensorflow (imported from the ONNX model) - almost nothing looks like a series of predefined operations:
I found a nicely trained LSTM-based network.
The network allows for masking.
for l in range(len(model.layers)):
d=model.layers[l].__dict__
print(d['supports_masking'])
print(d['name'])
is True for me for all the 'name' beside the input layers.
I also have a time serie with missing timestamps, which I replace by the correct mask_value.
Is the network using all the masked_values as other ordinary values to determine the final prediction, so all the computation of the forward pass are actually executed (example update of the state in an LSTM for each timestamp in input) or the masked samples are completely skipped so the computation never take places?
Keras will skip time steps, as said in the documentation.
How specifically does tensorflow apply dropout when calling tf.nn.rnn_cell.DropoutWrapper() ?
Everything I read about applying dropout to rnn's references this paper by Zaremba et. al which says don't apply dropout between recurrent connections. Neurons should be dropped out randomly before or after LSTM layers, but not inter-LSTM layers. Ok.
The question I have is how are the neurons turned off with respect to time?
In the paper that everyone cites, it seems that a random 'dropout mask' is applied at each timestep, rather than generating one random 'dropout mask' and reusing it, applying it to all the timesteps in a given layer being dropped out. Then generating a new 'dropout mask' on the next batch.
Further, and probably what matters more at the moment, how does tensorflow do it? I've checked the tensorflow api and tried searching around for a detailed explanation but have yet to find one.
Is there a way to dig into the actual tensorflow source code?
You can check the implementation here.
It uses the dropout op on the input into the RNNCell, then on the output, with the keep probabilities you specify.
It seems like each sequence you feed in gets a new mask for input, then for output. No changes inside of the sequence.
I'm trying to use TensorFlow to train output servo commands given an input image.
I plan on using a file as #mrry suggested in this question, with the images like so:
../some/path/some_img.JPG *some_label*
My question is, what are the label formats I can provide to TensorFlow and what structures are suggested?
My data is basically n servo commands from 0-10 seconds. A vector would work great:
[0,2,4,3]
or similarly:
[0,.25,.4,.3]
I couldn't find much about labels in the docs. Can anyone shed any light on TensorFlow labels?
And a very related question is what is the best way to structure these for TensorFlow to properly learn from them?
In Tensorflow Labels are just generic tensor. You can use any kind of tensor to store your labels. In your case a 1-D tensor with shape (4,) seems to be desired.
Labels do only differ from the rest of the data by its use in the computational graph. (Usually) labels should only be used inside the loss function while you propagate the other data through the whole network. For your problem a 4-d regression function should work.
Also, look at my newest comment to the (old) question. Using the slice_input_producer seems to be preferable in your case.