I produced a feature vector I am very happy with by cloning the BERT repo, downloading the "BERT-Base, Uncased" pre-trained model, and running extract_features.py like so:
PYTHONPATH=. python extract_features.py --input_file=~/sandbox/input.txt --output_file=~/sandbox/bert_output.jsonl --vocab_file=$BERT_BASE_DIR/vocab.txt --bert_config_file=$BERT_BASE_DIR/bert_config.json --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt --layers=-2 --max_seq_length=128 --batch_size=8
Note the --layers=-2 arg, which specifies that I want the features from the second-to-last layer.
I am now trying to reproduce the same features using this TensorFlow Hub model, which I believe to be the same model. I used this hack suggested on the TF Hub GitHub to access the desired layer, since only the output layer is exposed. The feature vector I get is suspiciously close, but not identical (individual floats are within about 1% of each other). I have confirmed that my input tokens are identical in both cases. Hoping someone with more knowledge of BERT configurations and internals can spot something obvious that I've overlooked, or suggest a way to proceed with debugging? I'm at a loss since the interfaces are pretty different.
Related
I'm looking into training an object detection network using Tensorflow, and I had a look at the TF2 Model Zoo. I noticed that there are noticeably less models there than in the directory /models/research/models/, including the MobileDet with SSDLite developed for the jetson xavier.
To clarify, the readme says that there is a MobileDet GPU with SSDLite, and that the model and checkpoints trained on COCO are provided, yet I couldn't find them anywhere in the repo.
How is one supposed to use those models?
I already have a custom-trained MobileDetv3 for image classification, and I was hoping to see a way to turn the network into an object detection network, in accordance with the MobileDetv3 paper. If this is not straightforward, training one network from scratch could be ok too, I just need to know where to even start from.
If you plan to use the object detection API, you can't use your existing model. You have to choose from a list of models here for v2 and here for v1
The documentation is very well maintained and the steps to train or validate or run inference (test) on custom data is very well explained here by the TensorFlow team. The link is meant for TensorFlow version v2. However, if you wish to use v1, the process is fairly similar and there are numerous blogs/videos explaining how to go about it
My goal is to test out Google's BERT algorithm in Google Colab.
I'd like to use a pre-trained custom model for Finnish (https://github.com/TurkuNLP/FinBERT). The model can not be found on TFHub library. I have not found a way to load model with Tensorflow Hub.
Is there a neat way to load and use a custom model with Tensorflow Hub?
Fundamentally: yes. Everyone can create the kind of models that TF Hub hosts, and I hope authors of interesting models do consider that.
For TF1 and the hub.Module format tailored to it, see
https://www.tensorflow.org/hub/tf1_hub_module#creating_a_new_module
For TF2 and its revised SavedModel format, see
https://www.tensorflow.org/hub/tf2_saved_model#creating_savedmodels_for_tf_hub
That said, a sophisticated model like BERT requires a bit of attention to export it with all bells and whistles, so it helps to have some tooling to build on. The BERT reference implementation for TF2 at https://github.com/tensorflow/models/tree/master/official/nlp/bert comes with an open-sourced export_tfhub.py script, and anyone can use that to export custom BERT instances created from that code base.
However, I understand from https://github.com/TurkuNLP/FinBERT/blob/master/nlpl_tutorial/training_bert.md#general-info that you are using Nvidia's fork of the original TF1 implementation of BERT. There are Hub modules created from the original research code, but the tooling to that end has not been open-sourced, and Nvidia doesn't seem to have added their own either.
If that's not changing, you'll probably have to resort to doing things the pedestrian way and get acquainted with their codebase and load their checkpoints into it.
For some reason, I get wildly different loss and acc when I evaluate my BERT test set right after training vs. when I load from a saved checkpoint. I thought it might have been my adaptation of BERT, so I tried modifying the run_classifier.py script as little as possible to fit my use case, and I still am seeing this problem.
The only reason I can think of is that the model isn't loading correctly, but I don't know how to fix it. I believe I'm loading how originally intended. For the init_checkpoint parameter, I pass path/to/classifier/model.ckpt-{last_step}. There are three model files (meta, index, data) but there are also the checkpoint, events, and graph files. Do I need to be doing something with those other three files as well? I'm used to using keras, and this pure tensorflow saving/loading process seems unnecessarily convoluted to me.
Thank you in advance for any help/insight regarding BERT or pure tf saving/loading! If you're unfamiliar with BERT, here's the github link: BERT GitHub
I know how to load a pre-trained image models from Tensorflow Hub. like so:
#load model
image_module = hub.Module('https://tfhub.dev/google/imagenet/mobilenet_v2_035_128/feature_vector/2')
#get predictions
features = image_module(batch_images)
I also know how to customize the output of this model (fine-tune on new dataset). The existing Modules expect input batch_images to be a RGB image tensor.
My question: Instead of the input being a RGB image of certain dimensions, I would like to use a tensor (dim 20x20x128, from a different model) as input to the Hub model. This means I need to by-passing the initial layers of the tf-hub model definition (i don't need them). Is this possible in tf-hub module api's? Documentation is not clear on this aspect.
p.s.: I can do this easily be defining my own layers but trying to see if i can use the Tf-Hub API's.
The existing https://tfhub.dev/google/imagenet/... modules do not support this.
Generally speaking, the hub.Module format allows multiple signatures (that is, combinations of input/output tensors; think feeds and fetches as in tf.Session.run()). So module publishers can arrange for that if there is a common usage pattern they want to support.
But for free-form experimentation at this level of sophistication, you are probably better off directly using and tweaking the code that defines the models, such as TF Slim (for TF1.x) or Keras Applications (also for TF2). Both provide Imagenet-pretrained checkpoints for downloading and restoring on the side.
Has anyone implement the FRCNN for TensorFlow version?
I found some related repos as following:
Implement roi pool layer
Implement fast RCNN based on py-faster-rcnn repo
but for 1: assume the roi pooling layer works (I haven't tried), and there are something need to be implemented as following:
ROI data layer e.g. roidb.
Linear Regression e.g. SmoothL1Loss
ROI pool layer post-processing for end-to-end training which should convert the ROI pooling layer's results to feed into CNN for classifier.
For 2: em...., it seems based on py-faster-rcnn which based on Caffe to prepared pre-processing (e.g. roidb) and feed data into Tensorflow to train the model, it seems weird, so I may not tried it.
So what I want to know is that, will Tensorflow support Faster RCNN in the future?. If not, do I have any mis-understand which mentioned above? or has any repo or someone support that?
Tensorflow has just released an official Object Detection API here, that can be used for instance with their various slim models.
This API contains implementation of various Pipelines for Object Detection, including popular Faster RCNN, with their pre-trained models as well.