Apache OpenNLP: How do I implement a dictionary based entity recognition?

Apache OpenNLP: How do I implement a dictionary based entity recognition? - apache

I have already downloaded the jar files to eclipse.
http://opennlp.apache.org/documentation/1.5.3/apidocs/opennlp-tools/index.html
How do I do the following:
1.) Be able to add my own names and tags.
2.) Be able to get the names and tags that were in the dictionary.
3.) Configure between case sensitive and insensitive.
For example, let's say, I add the name "Mike Smith" with name tag "Author".
If I have text that has that name, it should be able to recognize that its there along with the tag.
Please give actual java code!!!

I have asked a very similar question here:
Is it possible to conduct 'Context Analysis' for precise entity extraction with OpenNLP?
general concensus is that its 2 steps, first to identify if your sentence contains Author, the second to find the name.
I too would like to do it in 1 step (where the analysis of the corpus includes the words within itself as a way to determine the context of the name)

Related

Can we reference tags in karate? [duplicate]

I'm wondering if you can use wildcard characters with tags to get all tagged scenarios/features that match a certain pattern.
For example, I've used 17 unique tags on many scenarios throughout many of my feature files. The pattern is "#jira=CIS-" followed by 4 numbers, like #jira=CIS-1234 and #jira=CIS-5678.
I'm hoping I can use a wildcard character or something that will find all of the matches for me.
I want to be able to exclude them from being run, when I run all of my features/scenarios.
I've tried the follow:
--tags ~#jira
--tags ~#jira*
--tags ~#jira=*
--tags ~#jira=
Unfortunately none have given my the results I wanted. I was only able to exclude them when I used the exact tag, ex. ~#jira=CIS-1234. It's not a good solution to have to add each single one (of the 17 different tags) to the command line. These tags can change frequently, with new ones being added and old ones being removed, plus it would make for one real long command.

Yes. First read this - there is this un-documented expression-language (based on JS) for advanced tag selction based on the #key=val1,val2 form: https://stackoverflow.com/a/67219165/143475
So you should be able to do this:
valuesFor('#jira').isPresent
And even (here s will be a string, on which you can even do JS regex if you know how):
valuesFor('#jira').isEach(s => s.startsWith('CIS-'))
Would be great to get your confirmation and then this thread itself can help others and we can add it to the docs at some point.

cTAKES UMLS ICD10 codes lookup

I created a cTAKES custom dictionary from UMLS database with ICD10 codes.
Right now I able to analyze the text by for example disease name, like Asthma and annotation index will contain the ICD10 code for this matching code = "J45.90".
Is it possible to configure cTAKES in order to reverse this process in order to look for ICD10 code appearance in the text instead?

The XML output contains the start and ends of a matched concept in the original corpus. I personally find it easier to convert the XML to a simple JSON format and then loop through it as needed.
I have been working on an open source solution for parsing out the data and displaying the corpus with the matches it in HTML: https://github.com/GoTeamEpsilon/ctakes-friendly-web-ui#demonstration - let me know if you'd like to contribute.

Creating Configuration File for DDS Recording Service

I'm a beginner looking for some clarity on how to create configuration files for the DDS Recording Service in two areas.
If you are looking to record a set of specific topics from a domain how do you set up the topic group? Can you list the topics as individual <topic_expr> i.e.
<topic_group name="SomeTopics">
<topics>
<topic_expr>topic2</topic_expr>
<topic_expr>topic8</topic_expr>
</topics>
<field_expr>*</field_expr>
</topic_group>
When I tried something like this not all the listed topics would be recorded. Is there something I am overlooking?
Secondly, when you use -deserialize to you need to make any changes to the configuration file you used to record the database? As I sometimes get errors about how "rti dds failed to find" followed by something like X::Y::Z. Thanks.

The XSD schema for the configuration file does not expect you to use multiple <topic_expr> tags, but a single tag with a comma-separated list of Topic names. The RTI Recording Service User's Manual explains it as follows:
<topic_expr>POSIX fn expression</topic_expr>
Required.
A comma-separated list of POSIX expressions that specify the names of Topics to be included in the TopicGroup.
The syntax and semantics are the same as for Partition matching.
Default: Null
Note: Keep in mind that spaces are valid first characters in topic names, thus they can affect the matching process. For example, this will match both Triangle and Square topics (notice there is no space before Square):
<topic_expr>Triangle,Square</topic_expr>
However the following will only match Triangle topics (because there is a space before Square):
<topic_expr>Triangle, Square</topic_expr>
With regard to the -deserialize option, this is not applicable to the Recording Service but to the Converter tool (rtirecconv). If you want to record deserialized, you will have to indicate that in the Recording Service configuration, via the tag <deserialize_mode>. Again, see the User's Manual for details.

Custom naming of Aeroo report filename in Odoo

is there any way to get the report output from Aeroo named with a custom naming pattern?
I.e., for the invoice: [year]_[invoice number]...

#Raffaele, I'd recommend taking a look here and to this forum post.
You'll need to use some basic python logic in the report_custom_filename module to create the file name you need according to your requirements.
Using the following example I can create output for a filename for Sales Order/Quotation:
${(object.name or '').replace('/','_')}_${object.state == 'draft' and 'draft' or '' +'.xls'}
That looks like this:
SO039_.xls
You can add another field from the document/report you're printing out by adding another section, for example:
${(object.client_order_ref or '').replace('/','_')}_
this will add the field client_order_ref in front of the document name like this:
[Here's your client order reference]_SO039.xls
Have a look at what fields are available in the model you're trying to get this information from (eg. in my case sale.order) and I think you'll find roughly what you need there.
I have still not figured out how to add a date/timestamp like you are requesting (eg. Year), however someone else may be able to offer some advice on this.

Documentation with Diagram "Hyperlinks" in Enterprise Architect?

I'm struggling to get all the required (and only the required) information into the documentation of my Enterprise Architect Project. Precisely: we have modelled various requirements and displayed the source "standards" for these requirements in our diagrams by using the "hyperlink"-element out of the common toolbox. (This allows us to capture a title, the website where the documentation is found and a description of this documentation).
Now this element is visible on the diagram, but not in the package-view of our model and it does not get generated in our word (docx) documentation.
I can see that it should be possible to get this in the documentation, because a "Model Report" which basically prints everything does print the hyperlinks. But I can't find what I have to select in my template (in the package-tree view, as a package field, element field or diagram field) in order to get this printed. I can't just use the model report since this basically dumps the whole database in the document and reverse-engineering this model report has proven too difficult for me. Actually I would expect this to be in some kind of documentation for EA, but could not find such a thing with this level of detail... is there, is there a reproducible way of finding such things out in further cases? (btw I'm using EA 11.0)
[sorry there were illustrations here, but I'm not allowed to upload them...]

As Geert has already noted, there is a difference between "proper" elements and diagram-only elements. This is actually reflected in the document template editor, where there is an "Element" section inside the "Diagram" section. This will produce output for all elements in the diagram, whether or not they are also in the project browser.
Here's an example of the information you can pull out of your hyperlinks. Given a diagram with a hyperlink:
... and a template which outputs name, alias and hyperlink for each element in the diagram:
... EA will generate a document will the following contents:
So if you want the hyperlink to result in a hyperlink in the document, use the HyperlinkAlias field.
What might be a bit confusing is the fact that in addition to the Hyperlink element type in the Common diagram toolbox, EA allows you to create hyperlinks in regular elements (in the Element Properties dialog, Related tab: Files, which can be local files or web addresses).
In fact, I would recommend that you use those in your Requirement elements rather than diagram-only Hyperlinks if traceability is a priority in your model. The diagram-only Hyperlinks, on the other hand, give you a clearer visual.
Selecting a subset of the elements in a diagram ("only the required information") is a little more involved and depends on how your model is structured. Template fragments will get the job done, but you might be able to achieve your desired result by just using the filters in the document generation dialog.

The hyperlink is an element that is stored in the same package as the diagram it is used on, it is just not visible in the project browser (similar to a note element).
There's a good chance that it doesn't have a name, so make sure you don't omit nameless elements.
So if you print all the element of the package containing the diagram then you should be able to print the hyperlink as well.
In case that fails you might want to consider creating a template fragment based on an SQL query or a script. Those offer lots of flexibility to print whatever you need, even if it is located in a different package.

[Edited on 04.05.15 to reflect the comment by Uffe and provide a final solution]
Ok, based on Geerts answer, using the following custom query fragment in the diagram section:
select
t_object.ea_guid as CLASSGUID,
t_object.Object_Type as CLASSTYPE,
t_object.Object_Id as OBJECTID,
t_object.name as HL_Name,
t_object.Stereotype as HL_Stereotype,
t_object.object_type as HL_Type,
t_object.Alias as HL_Alias,
Note as Notes
--,t_object.*
from t_object
left join t_diagramobjects on (t_object.Object_ID = t_diagramobjects.Object_ID)
left join t_diagram on (t_diagram.Diagram_ID = t_diagramobjects.Diagram_ID)
where t_diagram.Diagram_ID = '#DIAGRAMID#'
and t_object.Object_Type='Text'
I was able to get a list of the hyperlinks following the diagram, this is the fragment:
custom >
{HL_Alias}: {HL_Name}
{Notes}
< custom
The "Notes" can be printed by getting the attribute directly out of the t_object table. Don't get confused as I was at first: the auto-completion on t_object and the results (t_object.*) DO NOT SHOW a Note-Attribute, but it does exist an when you write it into the query, it gets generated in the document.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas