I am using the Language Interpretability Toolkit (LIT) to load and analyze a BERT model that I pre-trained on an NER task.
However, when I'm starting the LIT script with the path to my pre-trained model passed to it, it fails to initialize the weights and tells me:
modeling_utils.py:648] loading weights file bert_remote/examples/token-classification/Data/Models/results_21_03_04_cleaned_annotations/04.03._8_16_5e-5_cleaned_annotations/04-03-2021 (15.22.23)/pytorch_model.bin
modeling_utils.py:739] Weights of BertForTokenClassification not initialized from pretrained model: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
modeling_utils.py:745] Weights from pretrained model not used in BertForTokenClassification: ['bert.embeddings.position_ids']
It then simply uses the bert-base-german-cased version of BERT, which of course doesn't have my custom labels and thus fails to predict anything. I think it might have to do with PyTorch, but I can't find the error.
If relevant, here is how I load my dataset into CoNLL 2003 format (modification of the dataloader scripts found here):
def __init__(self):
# Read ConLL Test Files
self._examples = []
data_path = "lit_remote/lit_nlp/examples/datasets/NER_Data"
with open(os.path.join(data_path, "test.txt"), "r", encoding="utf-8") as f:
lines = f.readlines()
for line in lines[:2000]:
if line != "\n":
token, label = line.split(" ")
self._examples.append({
'token': token,
'label': label,
})
else:
self._examples.append({
'token': "\n",
'label': "O"
})
def spec(self):
return {
'token': lit_types.Tokens(),
'label': lit_types.SequenceTags(align="token"),
}
And this is how I initialize the model and start the LIT server (modification of the simple_pytorch_demo.py script found here):
def __init__(self, model_name_or_path):
self.tokenizer = transformers.AutoTokenizer.from_pretrained(
model_name_or_path)
model_config = transformers.AutoConfig.from_pretrained(
model_name_or_path,
num_labels=15, # FIXME CHANGE
output_hidden_states=True,
output_attentions=True,
)
# This is a just a regular PyTorch model.
self.model = _from_pretrained(
transformers.AutoModelForTokenClassification,
model_name_or_path,
config=model_config)
self.model.eval()
## Some omitted snippets here
def input_spec(self) -> lit_types.Spec:
return {
"token": lit_types.Tokens(),
"label": lit_types.SequenceTags(align="token")
}
def output_spec(self) -> lit_types.Spec:
return {
"tokens": lit_types.Tokens(),
"probas": lit_types.MulticlassPreds(parent="label", vocab=self.LABELS),
"cls_emb": lit_types.Embeddings()
This actually seems to be expected behaviour. In the documentation of the GPT models the HuggingFace team writes:
This will issue a warning about some of the pretrained weights not being used and some weights being randomly initialized. That’s because we are throwing away the pretraining head of the BERT model to replace it with a classification head which is randomly initialized.
So it seems to not be a problem for the fine-tuning. In my use case described above it worked despite the warning as well.
The Tensorflow 2 documentation for preprocessing / feature engineering over a Keras model seems to be quite confusing and isn't very friendly.
Currently I have a simple Keras N-layer model with TF feature columns feeding as dense layer. For training I have CSV files read using tf.dataset API and I have written a feature engineering function that creates new features using dataset.map function.
def feature_engg_features(features):
#Add new features
features['nodlgrbyvpatd'] = features['NODLGR'] / features['VPATD']
return(features)
I can save the model easily using tf.keras.models.save_model method. However I am having trouble figuring out how to attach the feature_engineering steps in the serving function.
Requirement: Now I want to take the same feature engineering function above and attach it to my serving function so that in JSON input via tensorflow_model_server the same feature engineering steps are applied. I know about the lambda Layer option in Keras but I want to do this via saved_model method but there are a lot of difficulties here.
For Example, below code gives error:
def feature_engg_features(features):
#Add new features
features['nodlgrbyvpatd'] = features['NODLGR'] / features['VPATD']
return(features)
#tf.function
def serving(data):
data = tf.map_fn(feature_engg_features, data, dtype=tf.float32)
# Predict
predictions = m_(data)
version = "1"
tf.keras.models.save_model(
m_,
"./exported_model/" + version,
overwrite=True,
include_optimizer=True,
save_format=None,
signatures=serving,
options=None
)
Error:
Only `tf.functions` with an input signature or concrete functions can be used as a signature.
The above error is because I have not provided InputSignature of my Keras model but I am not able to understand that I have 13 input fields, what is expected as input signature.
So I wanted to know if anyone knows the shortest way of solving this out. This is a very basic requirement and Tensorflow seems to have kept this quite complicated for Keras Tensorflow model serving.
GIST: https://colab.research.google.com/gist/rafiqhasan/6abe93ac454e942317005febef59a459/copy-of-dl-e2e-structured-mixed-data-tf-2-keras-estimator.ipynb
EDIT:
I fixed it, so TensorSpec has to be generated and passed for each feature and also model( ) has to be called in serving function.
#tf.function
def serving(WERKS, DIFGRIRD, SCENARIO, TOTIRQTY, VSTATU, EKGRP, TOTGRQTY, VPATD, EKORG, NODLGR, DIFGRIRV, NODLIR, KTOKK):
##Feature engineering
nodlgrbyvpatd = tf.cast(NODLGR / VPATD, tf.float32)
payload = {
'WERKS': WERKS,
'DIFGRIRD': DIFGRIRD,
'SCENARIO': SCENARIO,
'TOTIRQTY': TOTIRQTY,
'VSTATU': VSTATU,
'EKGRP': EKGRP,
'TOTGRQTY': TOTGRQTY,
'VPATD': VPATD,
'EKORG': EKORG,
'NODLGR': NODLGR,
'DIFGRIRV': DIFGRIRV,
'NODLIR': NODLIR,
'KTOKK': KTOKK,
'nodlgrbyvpatd': nodlgrbyvpatd,
}
## Predict
##IF THERE IS AN ERROR IN NUMBER OF PARAMS PASSED HERE OR DATA TYPE THEN IT GIVES ERROR, "COULDN'T COMPUTE OUTPUT TENSOR"
predictions = m_(payload)
return predictions
serving = serving.get_concrete_function(WERKS=tf.TensorSpec([None,], dtype= tf.string, name='WERKS'),
DIFGRIRD=tf.TensorSpec([None,], name='DIFGRIRD'),
SCENARIO=tf.TensorSpec([None,], dtype= tf.string, name='SCENARIO'),
TOTIRQTY=tf.TensorSpec([None,], name='TOTIRQTY'),
VSTATU=tf.TensorSpec([None,], dtype= tf.string, name='VSTATU'),
EKGRP=tf.TensorSpec([None,], dtype= tf.string, name='EKGRP'),
TOTGRQTY=tf.TensorSpec([None,], name='TOTGRQTY'),
VPATD=tf.TensorSpec([None,], name='VPATD'),
EKORG=tf.TensorSpec([None,], dtype= tf.string, name='EKORG'),
NODLGR=tf.TensorSpec([None,], name='NODLGR'),
DIFGRIRV=tf.TensorSpec([None,], name='DIFGRIRV'),
NODLIR=tf.TensorSpec([None,], name='NODLIR'),
KTOKK=tf.TensorSpec([None,], dtype= tf.string, name='KTOKK')
)
version = "1"
tf.saved_model.save(
m_,
"./exported_model/" + version,
signatures=serving
)
So the right way to do this is here, Feature engineering and Pre-processing can be done in the serving_default method through below option. I tested it further via Tensorflow serving.
#tf.function
def serving(WERKS, DIFGRIRD, SCENARIO, TOTIRQTY, VSTATU, EKGRP, TOTGRQTY, VPATD, EKORG, NODLGR, DIFGRIRV, NODLIR, KTOKK):
##Feature engineering
nodlgrbyvpatd = tf.cast(NODLGR / VPATD, tf.float32)
payload = {
'WERKS': WERKS,
'DIFGRIRD': DIFGRIRD,
'SCENARIO': SCENARIO,
'TOTIRQTY': TOTIRQTY,
'VSTATU': VSTATU,
'EKGRP': EKGRP,
'TOTGRQTY': TOTGRQTY,
'VPATD': VPATD,
'EKORG': EKORG,
'NODLGR': NODLGR,
'DIFGRIRV': DIFGRIRV,
'NODLIR': NODLIR,
'KTOKK': KTOKK,
'nodlgrbyvpatd': nodlgrbyvpatd,
}
## Predict
##IF THERE IS AN ERROR IN NUMBER OF PARAMS PASSED HERE OR DATA TYPE THEN IT GIVES ERROR, "COULDN'T COMPUTE OUTPUT TENSOR"
predictions = m_(payload)
return predictions
serving = serving.get_concrete_function(WERKS=tf.TensorSpec([None,], dtype= tf.string, name='WERKS'),
DIFGRIRD=tf.TensorSpec([None,], name='DIFGRIRD'),
SCENARIO=tf.TensorSpec([None,], dtype= tf.string, name='SCENARIO'),
TOTIRQTY=tf.TensorSpec([None,], name='TOTIRQTY'),
VSTATU=tf.TensorSpec([None,], dtype= tf.string, name='VSTATU'),
EKGRP=tf.TensorSpec([None,], dtype= tf.string, name='EKGRP'),
TOTGRQTY=tf.TensorSpec([None,], name='TOTGRQTY'),
VPATD=tf.TensorSpec([None,], name='VPATD'),
EKORG=tf.TensorSpec([None,], dtype= tf.string, name='EKORG'),
NODLGR=tf.TensorSpec([None,], name='NODLGR'),
DIFGRIRV=tf.TensorSpec([None,], name='DIFGRIRV'),
NODLIR=tf.TensorSpec([None,], name='NODLIR'),
KTOKK=tf.TensorSpec([None,], dtype= tf.string, name='KTOKK')
)
version = "1"
tf.saved_model.save(
m_,
"./exported_model/" + version,
signatures=serving
)
I would like the fine tune the embeddings produced by googles universal sentence encoder large 3(https://tfhub.dev/google/universal-sentence-encoder-large/3) to my own corpus. Any suggestions on how to do this would be greatly appreciated. My current idea is to feed sentence pairs from my corpus to the encoder and then use an extra layer to classify if they are the same semantically. My trouble is that I am not sure how to set this up as this requires setting up two USE models that share weights, I believe it is called a siamese network. Any help on how to do this would be greatly appreciated
def train_and_evaluate_with_module(hub_module, train_module=False):
embedded_text_feature_column1 = hub.text_embedding_column(
key="sentence1", module_spec=hub_module, trainable=train_module)
embedded_text_feature_column2 = hub.text_embedding_column(
key="sentence2", module_spec=hub_module, trainable=train_module)
estimator = tf.estimator.DNNClassifier(
hidden_units=[500, 100],
feature_columns=[embedded_text_feature_column1,embedded_text_feature_column2],
n_classes=2,
optimizer=tf.train.AdagradOptimizer(learning_rate=0.003))
estimator.train(input_fn=train_input_fn, steps=1000)
train_eval_result = estimator.evaluate(input_fn=predict_train_input_fn)
test_eval_result = estimator.evaluate(input_fn=predict_test_input_fn)
training_set_accuracy = train_eval_result["accuracy"]
test_set_accuracy = test_eval_result["accuracy"]
return {
"Training accuracy": training_set_accuracy,
"Test accuracy": test_set_accuracy
}
See https://github.com/tensorflow/hub/issues/134: initialize one hub.Module(..., trainable=True) object and call it twice.
I am trying to finetune vgg_16 model with the Momentum Optimizer . For this, I use the pretrained models from here.
Before finetuning, I assign the varible values from the models as following,
variables_to_restore = slim.get_variables_to_restore(exclude=["vgg_16/fc8"])
init_assign_op, init_feed_dict = slim.assign_from_checkpoint(model_path, variables_to_restore)
Note, I do not exclude the vgg_16/*/*/Momentum variables. Hence I recieve an error,
ValueError: Checkpoint is missing variable [vgg_16/conv1/conv1_1/weights/Momentum],
as expected.
My problem is that including all the Momentum variables in the exlude list very cumbersome(example). Is there an smarter way to exclude just the Momentum variables?
This is important since manual enterring of exclusions is impossible for large models such as resnet.
Thank you in advance!
You can solve this problem by using this code:
def _init_fn():
variables_to_restore = []
for var in slim.get_model_variables():
excluded = False
for exclusion in exclusions:
if var.op.name.startswith(exclusion):
excluded = True
break
if not excluded:
variables_to_restore.append(var)
if tf.gfile.IsDirectory(FLAGS.checkpoint_path):
checkpoint_path = tf.train.latest_checkpoint(FLAGS.checkpoint_path)
else:
checkpoint_path = FLAGS.checkpoint_path
tf.logging.info('Fine-tuning from %s' % checkpoint_path)
return slim.assign_from_checkpoint_fn(
checkpoint_path,
variables_to_restore,
ignore_missing_vars=FLAGS.ignore_missing_vars)
use this function in slim.learning.train(init_fn=init_fn,)
I have a deep model to train on CIFAR-10. Training works fine with CPU. However, when I use GPU support, it causes gradients for some batches to be NaNs (I checked it using tf.check_numerics) and it happens randomly but early enough. I believe the problem is related to my GPU.
My question is that: is there away not to update if at least one of the gradients has NaNs and force the model to proceed to the next batch ?
Edit: Perhaps I should elaborate more on my problem.
This is how I apply the gradients:
with tf.control_dependencies([tf.check_numerics(grad, message='Gradient %s check failed, possible NaNs' % var.name) for grad, var in grads]):
# Apply the gradients to adjust the shared variables.
apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)
I have thought of using tf.check_numerics first to verify that there are Nans in the gradients, and, then, if there are Nans (check failed) I can "pass" without using opt.apply_gradients. However, is there a way to catch an error with tf.control_dependencies ?
I could figure it out, albeit not in the most elegant way.
My solution is as follows:
1) check all gradients first
2) if gradients are NaNs-free, apply them
3) otherwise, apply fake update (with zero values), this needs gradient override.
This is my code:
First define custom gradient:
#tf.RegisterGradient("ZeroGrad")
def _zero_grad(unused_op, grad):
return tf.zeros_like(grad)
Then define an exception-handling function:
#this is added for gradient check of NaNs
def check_numerics_with_exception(grad, var):
try:
tf.check_numerics(grad, message='Gradient %s check failed, possible NaNs' % var.name)
except:
return tf.constant(False, shape=())
else:
return tf.constant(True, shape=())
Then create conditional node:
num_nans_grads = tf.Variable(1.0, name='num_nans_grads')
check_all_numeric_op = tf.reduce_sum(tf.cast(tf.stack([tf.logical_not(check_numerics_with_exception(grad, var)) for grad, var in grads]), dtype=tf.float32))
with tf.control_dependencies([tf.assign(num_nans_grads, check_all_numeric_op)]):
# Apply the gradients to adjust the shared variables.
def fn_true_apply_grad(grads, global_step):
apply_gradients_true = opt.apply_gradients(grads, global_step=global_step)
return apply_gradients_true
def fn_false_ignore_grad(grads, global_step):
#print('batch update ignored due to nans, fake update is applied')
g = tf.get_default_graph()
with g.gradient_override_map({"Identity": "ZeroGrad"}):
for (grad, var) in grads:
tf.assign(var, tf.identity(var, name="Identity"))
apply_gradients_false = opt.apply_gradients(grads, global_step=global_step)
return apply_gradients_false
apply_gradient_op = tf.cond(tf.equal(num_nans_grads, 0.), lambda : fn_true_apply_grad(grads, global_step), lambda : fn_false_ignore_grad(grads, global_step))