I want to remove the last layer(s) from MobileBERT from Hub. I know there is a solution for Keras Model in TensorFlow, but this case is different from that one.
I was thinking of something like this, but it doesn't seem user-friendly.
What is the common way of doing this?
There is no first-class APIs to do this. The solution along the lines what you have mentioned is the way to go.
Related
I'm trying to make a software in which I need to reverse the convolution process. I haven't found anything useful.
Yes, it is called Transposed Convolution in Tensorflow and also in PyTorch. Here is the link for TF1.14.
Here is for TF2.0.
In https://www.tensorflow.org/js/guide/save_load, it describes the format for saving model files as one that uses model.json and the corresponding model.weights.bin. I'm not sure if there's a name for talking about this format (I think it's the same as https://js.tensorflow.org/api/latest/#class:LayersModel but not entirely certain), and I'm wondering if there's a way to visualize them as a graph.
I was expecting to be able to load and view them in TensorBoard but don't see any way to do this with its "Graphs" tool, so perhaps no one has made anything like this yet.
One minimal way to do this is with the summary method on a loaded model, which will log to the console:
In the example I tried, this output doesn't match what I would expect from looking at the model.json though. It looks like this might only look at the outermost Sequential layer but I didn't look more closely.
You can examine models, layers and tensors and etc via Visor API https://js.tensorflow.org/api_vis/latest/. You just need to install it npm i #tensorflow/tfjs-vis in order to use.
Currently, there are a lot of deep learning models developed in Caffe instead of tensorflow. If I want to re-write these models in tensorflow, how to start? I am not familiar with Caffe structure. It seems to me that there are some files storing the model architecture only. My guess is that I only need to understand and transfer those architecture design into Tensorflow. The input/output/training will be re-written anyway. Is this thought meaningful?
I see some Caffe implementation also need to hack into the original Caffe framework down to the C++ level, and make some modifications. I am not sure under what kind of scenario the Caffe model developer need to go that deep? If I just want to re-implement their models in Tensorflow, do I need to go to check their C++ modifications, which are sometimes not documented at all.
I know there are some Caffe-Tensorflow transformation tool. But there are always some constraints, and I think re-write the model directly maybe more straightforward.
Any thougts, suggestions, and link to tutorials are highly appreciated.
I have already asked a similar question.
To synthetise the possible answers :
You can either use pre-existing tools like etheron's kaffe(which is really simple to use). But its simplicity comes at a cost: it is not easy to debug.
As #Yaroslav Bulatov answered start from scratch and try to make each layer match. In this regard I would advise you to look at ry's github which is a remarkable example where you basically have small helper functions which indicate how to reshape the weights appropriately from caffe to Tensorflow, which is the only real thing you have to do to make simple models match and also provides activations check layer by layer.
I am trying to implement a dynamic network, which is able to change the network structure according to the input data. Here is an example https://arxiv.org/pdf/1511.02799v3.pdf
I wonder it that possible to use TensorFlow to implement dynamic network?
I think we may need to use placeholder to control the network?
Thank you very much.
This was announced a few months after your question. This is a roundabout way to do that. I have heard other libraries like MxNet will support this too.
https://research.googleblog.com/2017/02/announcing-tensorflow-fold-deep.html
You might want to checkout DyeNet for true dynamic graphs.
Is the correct general approach to simply copy all of the code of class BasicLSTMCell(RNNCell) and replace all the matrix multiplication with conv2d operations? What are things that I have to keep in mind when implementing it this way?
That is the basic idea. I got an implementation of it working in tensorflow here. It can generate videos that look like this. They seem to work surprisingly well. I edited the rnn_cell.py file to get it working.