How to calculate the accuracy in Spacy module? - spacy

I have developed a Named Entity Recognition module and trying to calculate it's accuracy through Spacy.scorer
I have executed the below code and it has given me a error.
code
example = [
('Home page',
[(0,9,'USER INTERFACE')]),
('login form',
[(10,20, 'MAIN COMPONENT')]),
('type should',
[26,30, 'MINI COMPONENT'])
]
def evaluate(nlp1, doc):
scorer = Scorer()
for input_, annot in doc:
doc_gold_text = nlp1.make_doc(input_)
gold = GoldParse(doc_gold_text, entities=annot)
pred_value = nlp1(input_)
scorer.score(pred_value, gold)
return scorer.scores
Error:
TypeError: 'int' object is not iterable
How should i calculate the accuracy, recall and precision ?

Related

Tensorflow TabTransformer model get weights before training

I am working with the Tensorflow Tabtransformer custom model and I am trying to get the initial weights after declaration using model.get_weights() but I get the following error:
from tabtransformertf.models.tabtransformer import TabTransformer
from tabtransformertf.utils.preprocessing import df_to_dataset, build_categorical_prep
numeric_features = ['TotPkts', 'TotBytes', 'Seq', 'Dur', 'Mean', 'StdDev', 'Sum', 'Min',
'Max', 'SrcBytes', 'DstBytes', 'Rate', 'SrcRate',
'DstRate']
cat_features = ['SrcPkts', 'DstPkts']
TARGET_FEATURE_NAME = "Category"
category_prep_layers = build_categorical_prep(tab_df, cat_features) #tab_df is the dataframe I am working on
model = TabTransformer(
numerical_features = numeric_features,
categorical_features = cat_features,
categorical_lookup=category_prep_layers,
numerical_discretisers=None, # simply passing the numeric features
embedding_dim=32,
out_dim=1,
out_activation='sigmoid',
depth=6,
heads=5,
attn_dropout=0.087687,
ff_dropout=0.429539,
mlp_hidden_factors=[1, 1],
use_column_embedding=False,
)
model.get_weights()
ValueError: Weights for model sequential have not yet been created. Weights are created when the Model is first called on inputs or `build()` is called with an `input_shape`.

Weights of pre-trained BERT model not initialized

I am using the Language Interpretability Toolkit (LIT) to load and analyze a BERT model that I pre-trained on an NER task.
However, when I'm starting the LIT script with the path to my pre-trained model passed to it, it fails to initialize the weights and tells me:
modeling_utils.py:648] loading weights file bert_remote/examples/token-classification/Data/Models/results_21_03_04_cleaned_annotations/04.03._8_16_5e-5_cleaned_annotations/04-03-2021 (15.22.23)/pytorch_model.bin
modeling_utils.py:739] Weights of BertForTokenClassification not initialized from pretrained model: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
modeling_utils.py:745] Weights from pretrained model not used in BertForTokenClassification: ['bert.embeddings.position_ids']
It then simply uses the bert-base-german-cased version of BERT, which of course doesn't have my custom labels and thus fails to predict anything. I think it might have to do with PyTorch, but I can't find the error.
If relevant, here is how I load my dataset into CoNLL 2003 format (modification of the dataloader scripts found here):
def __init__(self):
# Read ConLL Test Files
self._examples = []
data_path = "lit_remote/lit_nlp/examples/datasets/NER_Data"
with open(os.path.join(data_path, "test.txt"), "r", encoding="utf-8") as f:
lines = f.readlines()
for line in lines[:2000]:
if line != "\n":
token, label = line.split(" ")
self._examples.append({
'token': token,
'label': label,
})
else:
self._examples.append({
'token': "\n",
'label': "O"
})
def spec(self):
return {
'token': lit_types.Tokens(),
'label': lit_types.SequenceTags(align="token"),
}
And this is how I initialize the model and start the LIT server (modification of the simple_pytorch_demo.py script found here):
def __init__(self, model_name_or_path):
self.tokenizer = transformers.AutoTokenizer.from_pretrained(
model_name_or_path)
model_config = transformers.AutoConfig.from_pretrained(
model_name_or_path,
num_labels=15, # FIXME CHANGE
output_hidden_states=True,
output_attentions=True,
)
# This is a just a regular PyTorch model.
self.model = _from_pretrained(
transformers.AutoModelForTokenClassification,
model_name_or_path,
config=model_config)
self.model.eval()
## Some omitted snippets here
def input_spec(self) -> lit_types.Spec:
return {
"token": lit_types.Tokens(),
"label": lit_types.SequenceTags(align="token")
}
def output_spec(self) -> lit_types.Spec:
return {
"tokens": lit_types.Tokens(),
"probas": lit_types.MulticlassPreds(parent="label", vocab=self.LABELS),
"cls_emb": lit_types.Embeddings()
This actually seems to be expected behaviour. In the documentation of the GPT models the HuggingFace team writes:
This will issue a warning about some of the pretrained weights not being used and some weights being randomly initialized. That’s because we are throwing away the pretraining head of the BERT model to replace it with a classification head which is randomly initialized.
So it seems to not be a problem for the fine-tuning. In my use case described above it worked despite the warning as well.

Performing inference with a BERT (TF 1.x) saved model

I'm stuck on one line of code and have been stalled on a project all weekend as a result.
I am working on a project that uses BERT for sentence classification. I have successfully trained the model, and I can test the results using the example code from run_classifier.py.
I can export the model using this example code (which has been reposted repeatedly, so I believe that it's right for this model):
def export(self):
def serving_input_fn():
label_ids = tf.placeholder(tf.int32, [None], name='label_ids')
input_ids = tf.placeholder(tf.int32, [None, self.max_seq_length], name='input_ids')
input_mask = tf.placeholder(tf.int32, [None, self.max_seq_length], name='input_mask')
segment_ids = tf.placeholder(tf.int32, [None, self.max_seq_length], name='segment_ids')
input_fn = tf.estimator.export.build_raw_serving_input_receiver_fn({
'label_ids': label_ids, 'input_ids': input_ids,
'input_mask': input_mask, 'segment_ids': segment_ids})()
return input_fn
self.estimator._export_to_tpu = False
self.estimator.export_savedmodel(self.output_dir, serving_input_fn)
I can also load the exported estimator (where the export function saves the exported model into a subdirectory labeled with a timestamp):
predict_fn = predictor.from_saved_model(self.output_dir + timestamp_number)
However, for the life of me, I cannot figure out what to provide to predict_fn as input for inference. Here is my best code at the moment:
def predict(self):
input = 'Test input'
guid = 'predict-0'
text_a = tokenization.convert_to_unicode(input)
label = self.label_list[0]
examples = [InputExample(guid=guid, text_a=text_a, text_b=None, label=label)]
features = convert_examples_to_features(examples, self.label_list,
self.max_seq_length, self.tokenizer)
predict_input_fn = input_fn_builder(features, self.max_seq_length, False)
predict_fn = predictor.from_saved_model(self.output_dir + timestamp_number)
result = predict_fn(predict_input_fn) # this generates an error
print(result)
It doesn't seem to matter what I provide to predict_fn: the examples array, the features array, the predict_input_fn function. Clearly, predict_fn wants a dictionary of some type - but every single thing that I've tried generates an exception due to a tensor mismatch or other errors that generally mean: bad input.
I presumed that the from_saved_model function wants the same sort of input as the model test function - apparently, that's not the case.
It seems that lots of people have asked this very question - "how do I use an exported BERT TensorFlow model for inference?" - and have gotten no answers:
Thread #1
Thread #2
Thread #3
Thread #4
Any help? Thanks in advance.
Thank you for this post. Your serving_input_fn was the piece I was missing! Your predict function needs to be changed to feed the features dict directly, rather than use the predict_input_fn:
def predict(sentences):
labels = [0, 1]
input_examples = [
run_classifier.InputExample(
guid="",
text_a = x,
text_b = None,
label = 0
) for x in sentences] # here, "" is just a dummy label
input_features = run_classifier.convert_examples_to_features(
input_examples, labels, MAX_SEQ_LEN, tokenizer
)
# this is where pred_input_fn is replaced
all_input_ids = []
all_input_mask = []
all_segment_ids = []
all_label_ids = []
for feature in input_features:
all_input_ids.append(feature.input_ids)
all_input_mask.append(feature.input_mask)
all_segment_ids.append(feature.segment_ids)
all_label_ids.append(feature.label_id)
pred_dict = {
'input_ids': all_input_ids,
'input_mask': all_input_mask,
'segment_ids': all_segment_ids,
'label_ids': all_label_ids
}
predict_fn = predictor.from_saved_model('../testing/1589418540')
result = predict_fn(pred_dict)
print(result)
pred_sentences = [
"That movie was absolutely awful",
"The acting was a bit lacking",
"The film was creative and surprising",
"Absolutely fantastic!",
]
predict(pred_sentences)
{'probabilities': array([[-0.3579178 , -1.2010787 ],
[-0.36648935, -1.1814401 ],
[-0.30407643, -1.3386648 ],
[-0.45970002, -0.9982413 ],
[-0.36113673, -1.1936386 ],
[-0.36672896, -1.1808994 ]], dtype=float32), 'labels': array([0, 0, 0, 0, 0, 0])}
However, the probabilities returned for sentences in pred_sentences do not match the probabilities I get use estimator.predict(predict_input_fn) where estimator is the fine-tuned model being used within the same (python) session. For example, [-0.27276006, -1.4324446 ] using estimator vs [-0.26713806, -1.4505868 ] using predictor.

How can i use 38 classes instead of 1000 in model.predict decode predictions

i am founding an error in plant disease detection using resnet50 deep learning model every time it raises an error message in decode_predictions
error
expects a batch of predictions (i.e. a 2D array of shape (samples, 1000)). Found array with shape: (1, 38)"
enter code here
model = ResNet50(weights='imagenet',include_top=False,classes=38)
try:
model = load_model('/content/drive/My
Drive/color/checkpoints/ResNet50_model_weights.h5')
print("model loaded")
except:
print("model not loaded")
img_path = '/content/drive/My Drive/color/test/0/appleblackrot188.jpg'
img = image.load_img(img_path, target_size=(300, 300))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
preds = model.predict(x)
print('Predicted:', decode_predictions(preds,top=3)[0])
You can try using the preprocesing function:
import tensorflow as tf
# Using the keras wrapper on tensorflow (it must be the same using just keras).
IMAGE = [] # From image source, i did it from the camera.
toPred = tf.keras.applications.resnet50.preprocess_input(np.expand_dims(tf.keras.preprocessing.image.img_to_array(IMAGE), axis=0))
Maybe that can help :)
decode_predictions works only for ImageNet (no. of classes = 1000). For these 38 classes of plants, you have to write your own decode predictions based on the ground truth label you've assigned for each plant.
First, you need an index JSON file and create a new decode_predictions function.
For example
This HAM10000 that has 7 classes and you need to split to each folder like this
then make an index JSON file like this
{
"0" : [
"AKIEC",
"akiec"
],
"1" : [
"BCC",
"bcc"
],
"2" : [
"BKL",
"bkl"
],
"3" : [
"DF",
"df"
],
"4" : [
"MEL",
"mel"
],
"5" : [
"NV",
"nv"
],
"6" : [
"VASC",
"vasc"
]
}
Then try this code
def decode_predictions(preds, top=4, class_list_path='/content/SKIN_DATA/HAM10000_index.json'):
if len(preds.shape) != 2 or preds.shape[1] != 7: # your classes number
raise ValueError('`decode_predictions` expects '
'a batch of predictions '
'(i.e. a 2D array of shape (samples, 1000)). '
'Found array with shape: ' + str(preds.shape))
index_list = json.load(open(class_list_path))
results = []
for pred in preds:
top_indices = pred.argsort()[-top:][::-1]
result = [tuple(index_list[str(i)]) + (pred[i],) for i in top_indices]
result.sort(key=lambda x: x[2], reverse=True)
results.append(result)
return results

"Output 0 of type double does not match declared output type string" while running the iris sample program in TensorFlow Serving

I am running the sample iris program in TensorFlow Serving. Since it is a TF.Learn model, I am exporting the model using the following classifier.export(export_dir=model_dir,signature_fn=my_classification_signature_fn) and the signature_fn is defined as shown below:
def my_classification_signature_fn(examples, unused_features, predictions):
"""Creates classification signature from given examples and predictions.
Args:
examples: `Tensor`.
unused_features: `dict` of `Tensor`s.
predictions: `Tensor` or dict of tensors that contains the classes tensor
as in {'classes': `Tensor`}.
Returns:
Tuple of default classification signature and empty named signatures.
Raises:
ValueError: If examples is `None`.
"""
if examples is None:
raise ValueError('examples cannot be None when using this signature fn.')
if isinstance(predictions, dict):
default_signature = exporter.classification_signature(
examples, classes_tensor=predictions['classes'])
else:
default_signature = exporter.classification_signature(
examples, classes_tensor=predictions)
named_graph_signatures={
'inputs': exporter.generic_signature({'x_values': examples}),
'outputs': exporter.generic_signature({'preds': predictions})}
return default_signature, named_graph_signatures
The model gets successfully exported using the following piece of code.
I have created a client which makes real-time predictions using TensorFlow Serving.
The following is the code for the client:
flags.DEFINE_string("model_dir", "/tmp/iris_model_dir", "Base directory for output models.")
tf.app.flags.DEFINE_integer('concurrency', 1,
'maximum number of concurrent inference requests')
tf.app.flags.DEFINE_string('server', '', 'PredictionService host:port')
#connection
host, port = FLAGS.server.split(':')
channel = implementations.insecure_channel(host, int(port))
stub = prediction_service_pb2.beta_create_PredictionService_stub(channel)
# Classify two new flower samples.
new_samples = np.array([5.8, 3.1, 5.0, 1.7], dtype=float)
request = predict_pb2.PredictRequest()
request.model_spec.name = 'iris'
request.inputs["x_values"].CopyFrom(
tf.contrib.util.make_tensor_proto(new_samples))
result = stub.Predict(request, 10.0) # 10 secs timeout
However, on making the predictions, the following error is displayed:
grpc.framework.interfaces.face.face.AbortionError: AbortionError(code=StatusCode.INTERNAL, details="Output 0 of type double does not match declared output type string for node _recv_input_example_tensor_0 = _Recv[client_terminated=true, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=2016246895612781641, tensor_name="input_example_tensor:0", tensor_type=DT_STRING, _device="/job:localhost/replica:0/task:0/cpu:0"]()")
Here is the entire stack trace.
enter image description here
The iris model is defined in the following manner:
# Specify that all features have real-value data
feature_columns = [tf.contrib.layers.real_valued_column("", dimension=4)]
# Build 3 layer DNN with 10, 20, 10 units respectively.
classifier = tf.contrib.learn.DNNClassifier(feature_columns=feature_columns,
hidden_units=[10, 20, 10],
n_classes=3, model_dir=model_dir)
# Fit model.
classifier.fit(x=training_set.data,
y=training_set.target,
steps=2000)
Kindly guide a solution for this error.
I think the problem is that your signature_fn is going on the else branch and passing predictions as the output to the classification signature, which expects a string output and not a double output. Either use a regression signature function or add something to the graph to get the output in the form of a string.