Spacy v3 - list of tags and dependencies can not be found - spacy

I tried to find the list of tags and dependencies for spacy v3, but I couldn't.
Does anyone know where the following lists can be found, but for v3?
v2 tags: https://v2.spacy.io/api/annotation#pos-tagging
v2 dependencies: https://v2.spacy.io/api/annotation#dependency-parsing

It's on the individual model pages, under "Label Scheme". See the English Model for example.

Related

Spacy entity linking: wiki dataset not connected

This relates to the Spacy entity linking library: https://github.com/egerber/spaCy-entity-linker
When I use the following code:
# python -m spacy_entity_linker "download_knowledge_base"
import spacy
nlp = spacy.load("en_core_web_md")
nlp.add_pipe("entity_linker", last=True)
doc = nlp("I watched the Pirates of the Caribbean last silvester")
all_linked_entities = doc._.linkedEntities
for sent in doc.sents:
sent._.linkedEntities.pretty_print()
I get: ValueError: [E139] Knowledge base for component 'entity_linker' is empty. Use the methods kb.add_entity and kb.add_alias to add entries.
I might need to add the downloaded knowledge base somewhere but it is nowhere stated.
The original code states that add.pipe should be (entity linking has a different name):
nlp.add_pipe("entityLinker", last=True)
But then i get the error:
ValueError: [E002] Can't find factory for 'entityLinker' for language English (en). This usually happens when spaCy calls nlp.create_pipe with a custom component name that's not registered on the current language class
Where are things going wrong?
installed the correct libraries
# pip install spacy-entity-linker
# python -m spacy_entity_linker "download_knowledge_base"
also have Spacy, the language model, python 3.8.
The SpaCy entity linker library assumes that you have spacy itself configured. As described here, you need to install a language core:
python -m spacy download en_core_web_md

No pos tags by Spacy's multilingual xx_ent_wiki_sm

I am using Spacy's multilingual pos-tagger -- xx_ent_wiki_sm. The problem is it doesn't return any pos tags. If you have encountered the same issue, please, share your ideas/solution. Thank you!
Code in python:
nlp = spacy.load('xx_ent_wiki_sm')
doc = nlp(u'Por David García')
print(' '.join('{word}/{tag}'.format(word=t.orth_, tag=t.pos_) for t in doc))
Por/ David/ García/```
This model does not contain a part-of-speech tagger, it only contains a named entity recognizer.

COCO json annotation to YOLO txt format

how to convert a single COCO JSON annotation file into a YOLO darknet format?? like below
each individual image has separate filename.txt file
My classmates and I have created a python package called PyLabel to help others with this task and other labelling tasks.
Our package does this conversion! You can see an example in this notebook https://github.com/pylabel-project/samples/blob/main/coco2yolov5.ipynb.
You're answer should be in there! But you should be able to do this conversion by doing something like:
!pip install pylabel
from pylabel import importer
dataset = importer.ImportCoco(path=path_to_annotations, path_to_images=path_to_images)
dataset.export.ExportToYoloV5(dataset)
You can find the source code that is used behind the scenes here https://github.com/pylabel-project/
I built a tool
https://github.com/tw-yshuang/coco2yolo
Download this repo and use the following command:
python3 coco2yolo.py [OPTIONS]
coc2yolo
Usage: coco2yolo.py [OPTIONS] [CAT_INFOS]...
Options:
-ann-path, --annotations-path TEXT
JSON file. Path for label. [required]
-img-dir, --image-download-dir TEXT
The directory of the image data place.
-task-dir, --task-categories-dir TEXT
Build a directory that follows the task-required categories.
-cat-t, --category-type TEXT Category input type. (interactive | file) [default: interactive]
-set, --set-computing-type TEXT
Set Computing for the data. (union | intersection) [default: union]
--help Show this message and exit.
There is an open-source tool called makesense.ai for annotating your images. You can download YOLO txt format once you annotate your images. But you won't be able to download the annotated images.
There is three ways.
use roboflow https://roboflow.com/formats (You can find another solution also)
You can find some usage guide for roboflow. e.g.
https://medium.com/red-buffer/roboflow-d4e8c4b52515
search 'convert coco format to yolo format' -> you will find some open-source codes to convert annotations to yolo format.
write your own code to convert coco format to yolo format

Spacy EN Model issue

Need to know the difference between spaCy's en and en_core_web_sm model.
I am trying to do NER with Spacy.( For Organization name)
Please find bellow the script I am using
import spacy
nlp = spacy.load("en_core_web_sm")
text = "But Google is starting from behind. The company made a late push \
into hardware, and Apple’s Siri, available on iPhones, and Amazon’s \
Alexa software, which runs on its Echo and Dot devices, have clear
leads in consumer adoption."
doc = nlp(text)
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)
And above providing me no output.
But when I use “en” model
import spacy
nlp = spacy.load("en")
text = "But Google is starting from behind. The company made a late push \
into hardware, and Apple’s Siri, available on iPhones, and Amazon’s \
Alexa software, which runs on its Echo and Dot devices, have clear
leads in consumer adoption."
doc = nlp(text)
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)
it provides me desired output:
Google 4 10 ORG
Apple’s Siri 92 104 ORG
iPhones 119 126 ORG
Amazon 132 138 ORG
Echo and Dot 182 194 ORG
What is going wrong in this?
Please help.
can I use en_core_web_sm model to have the same output like en model. if so please advice how to do it. Python 3 script with pandas df as input are solicited. Thanks
So each model is a Machine Learning model trained on top of a specific corpus (a text 'dataset'). This makes it so that each model can tag entries differently - especially because some models were trained on less data than others.
Currently Spacy offers 4 models for english, as presented in: https://spacy.io/models/en/
According to https://github.com/explosion/spacy-models, a model can be downloaded in several distinct ways:
# download best-matching version of specific model for your spaCy installation
python -m spacy download en_core_web_sm
# out-of-the-box: download best-matching default model
python -m spacy download en
Probably, when you downloaded the 'en' model, the best matching default model was not 'en_core_web_sm'.
Also, keep in mind that these models are updated every once in a while, which may have caused you to have two different versions of the same model.
Loading spacy.load('en_core_web_sm') instead of spacy.load('en') should help.
In my system result are same in both case
Code:-
import spacy
nlp = spacy.load("en_core_web_sm")
text = """But Google is starting from behind. The company made a late push
into hardware, and Apple’s Siri, available on iPhones, and Amazon’s
Alexa software, which runs on its Echo and Dot devices, have clear
leads in consumer adoption."""
doc = nlp(text)
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)
import spacy
nlp = spacy.load("en")
text = """But Google is starting from behind. The company made a late push \
into hardware, and Apple’s Siri, available on iPhones, and Amazon’s \
Alexa software, which runs on its Echo and Dot devices, have clear
leads in consumer adoption."""
doc = nlp(text)
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)

Using spacy visualizer with custom data

I want to visualize a sentence using Spacy's named entity visualizer. I have a sentence with some user defined labels over the tokens, and I want to visualize them using the NER rendering API.
I don't want to train and produce a predictive model, I have all needed labels from an external source, just need the visualization without messing too much with front-end libraries.
Any ideas how?
Thank you
You can manually modify the list of entities (doc.ents) and add new spans using token offsets. Be aware that entities can't overlap at all.
import spacy
from spacy.tokens import Span
nlp = spacy.load('en', disable=['ner'])
doc = nlp("I see an XYZ.")
doc.ents = list(doc.ents) + [Span(doc, 3, 4, "NEWENTITYTYPE")]
print(doc.ents[0], doc.ents[0].label_)
Output:
XYZ NEWENTITYTYPE