spacy Entity Linking at paragraph level - spacy

Entity linking works well for sentence based one. How to use spacy entity linking for a paragraph level instead of sentence level text? Any guidance on it would be helpful.

Related

Add custom punctuation to spacy model

How do you add custom punctuation (e.g. asterisk) to the infix list in a Tokenizer and have that recognized by nlp.explain as punctuation? I would like to be able to add characters that are not currently recognized as punctuation to the punctuation list from the list of set infixes so that the Matcher can use them when matching {'IS_PUNCT': True} .
An answer to a similar issue was provided here
How can I add custom signs to spaCy's punctuation functionality?
The only problem is I am unable to package the newly recognized punctuation with the model. A side note: the tokenizer already recognizes infixes with the desired punctuation, so all that is left is propagating this to the Matcher.
The lexeme attribute IS_PUNCT is completely separate from any of the tokenizer settings. In a packaged pipeline, you'd either create a custom language (https://spacy.io/usage/linguistic-features#language-subclass) or run the customization in a callback in [nlp.before_creation] (https://spacy.io/usage/training#custom-code-nlp-callbacks).
Be aware that modifying EnglishDefaults affects all English pipelines loaded in the same script, so the custom language option is cleaner (in particular if you're distributing this model for general use), but also slightly more work to implement.
On the other hand, if you're just using the Matcher, it might be easier to use a REGEX pattern to match the tokens you want instead of customizing IS_PUNCT.

Build something similar to sciSpacy, but say for another domain

I want to build a model similar to sciSpacy, but for another domain. How should I go about this?
You'll have to first make sure you have enough training data about your new domain. If you want to have a Named Entity Recognizer, you need texts annotated with named entities. If you want to have a parser, you need texts with dependency annotations. If you want a POS tagger, you need texts annotated with POS tags, etc.
Then you can create a new blank model, add the component(s) to them you need, and start training those:
nlp = spacy.blank("fr")
ner = nlp.create_pipe("ner")
nlp.add_pipe(ner)
ner.add_label("MY_DOMAIN_LABEL")
nlp.begin_training()
nlp.update(texts, annotations, drop=0.2)
This code snippet is not complete, because it really depends on what exactly it is you want to do. You can find more complete snippets in the documentation: https://spacy.io/usage/training
You might also be interested in the command-line utility to train new models, cf https://spacy.io/api/cli#train

Completing attribute name in RubyMine?

I'm trying Rails in RubyMine 2016.2.4. How to use the code completion for model's attribute name? In Yii2 the model's attributes are listed in comments
Internal structure and philosophy of Rails are different than Yii2's. Current database state is stored in separate special file called db/schema.rb. This file is automatically updated every time after applying new migrations and not intended for manual editing. In Yii2 you need to synchronize PHPDoc comments with current DB state manually. From the other side in Rails you can't see which attributes model contains just by looking at the model (the model code is very laconic in terms of that though).
Not sure, but I think for model attributes autocomplete RubyMine extracts column names from according table from that file. Read more info about db/schema.rb in official docs.
Also there is dedicated help section in RubyMine docs about Rails-Aware Code Completion.
So it works, but probably in specific places.
And last but not least check this related SO question. RubyMine provides very good autocomplete options, but don't hesitate to peek at db/schema.rb if needed or use DB managing tools to see column names and data during development.

ANTLR4 - Generate code from non-file inputs?

Where do we start to manually build a CST from scratch? Or does ANTLR4 always require the lex/parse process as our input step?
I have some visual elements in my program that represent code structures.
e.g. a square represents a class, while a circle embedded within that square represents a method.
Now I want to turn those into code. How do I use ANTLR4 to do this, at runtime (using ANTLR4.js)? Most of the ANTLR examples seem to rely on lexing and parsing existing code to get to a syntax tree. So rather than:
input code->lex->parse->syntax tree->output code (1)
I want
manually create syntax tree->output code (2)
(Later, as the user adds code to that class and its methods, then ANTLR will be used as in (1).)
EDIT Maybe I'm misunderstanding this. Do I create some custom data structure and then run the parser over it? i.e. write structures to some in-memory format->parse->output code (3)?
IIUC, you could use StringTemplate directly.
By, way of background, Antlr itself builds an in-memory parse-tree and then walks it, incrementally calling StringTemplate to output code snippets qualified by corresponding parse-tree node data. That Antlr uses an internal parse-tree is just a convenience for simplifying walking (since Antlr is built using Antlr).
If you have your own data structure, regardless of its specific implementation, procedurally process it to progressively call ST templates to emit the corresponding code. And, you can directly use the same templates that Antlr uses (JavaScript.stg), if they meet your requirements.
Of course, if your data structure is of a nature that can be lex'd/parsed into a standard Antlr parse-tree, you can then use a standard Antlr visitor to call and populate node-specific templates.

ANTLR Comment Propagation

Is there a method to propagate a comment in ANTLR to the code generated?
e.g. if I have the Subversion revision number keyword ($Rev$) in a comment within the *.g4 file, is there a way for this to be within the generated code, so that I know that the parser was generated with that revisions version of the language?
Cheers,
Adam
At this time, we are not copying the comments from the grammar into the generated code, although we should. Added https://github.com/antlr/antlr4/issues/375