cTAKES UMLS ICD10 codes lookup - ctakes

I created a cTAKES custom dictionary from UMLS database with ICD10 codes.
Right now I able to analyze the text by for example disease name, like Asthma and annotation index will contain the ICD10 code for this matching code = "J45.90".
Is it possible to configure cTAKES in order to reverse this process in order to look for ICD10 code appearance in the text instead?

The XML output contains the start and ends of a matched concept in the original corpus. I personally find it easier to convert the XML to a simple JSON format and then loop through it as needed.
I have been working on an open source solution for parsing out the data and displaying the corpus with the matches it in HTML: https://github.com/GoTeamEpsilon/ctakes-friendly-web-ui#demonstration - let me know if you'd like to contribute.

Related

Asp.NET database multi-lang design [Include HTML]

,
I read lots of answer here and learn something about this topic but I need more help.Some pages in my project are getting the html(page body) from database.For my controls I use resx files and it works great, but now I need save the html values with multi-lang in my database.
I have a admin panel for my project and I edit sometimes the html or text or pictures in my editor and save it back to DB.I cache the html values in my website and get better performance for my project.But the html text parts is yet in my default lang I will create multi-lang.
I thought of a few possible solutions,
First solution was copying all of the same HTML data and create for all lang one by one
First solution problem : But later this doesnt make sense If I copy the values and create for other lang, then I had be repeating unnecessarily in my database the html values.
My second solution was like this :
this I hid it in the normal way in my DB (only a part of html in one column)
<li>Hello World</li>
<li>Hello Me</li>
Solution :
<li>[HelloWorld]</li>
<li>[HelloMe]</li>
string newHtmlValue = oldValue.ReplaceByLang("[HelloWorld]", GetCulture();,"Hallo Welt");
***// GetCulture(); return for this example german !
I create a new table for the replace text like the key is [HelloWorld]
1(ROW)- [HelloWorld] eng Hello World
2(ROW)- [HelloWorld] german Hallo Welt
3(ROW)- [HelloWorld] fr bonjour tout le monde
and in my project I select the right html value by culture lang.
Now my question is have you any better idea ?
I hope I've been clear and not messy, but if you need more informations I'll be glad to tell you more.Sorry for english.
I would prefer to store all your localized string in single repository like - DB or Satellite assembly.
For Example if you choose DB as repository - Define 3 tables (minimal structure)
1.Locale - Define your locale
2.Resourcemaster - Define your source string and reference Key and this key should be unique in you application and define a standard format like Module_Control_Section ..
3.LocalizedResource - Define your localized sting of Resourcemaster with Locale Key. Foreign key with ResourceMaster and Locale
In front End you can resole any string like control , Html string with localized string and Unique reference key.
Also Implement UI Caching / API caching for better performance.
Regards
Abdul

Documentation with Diagram "Hyperlinks" in Enterprise Architect?

I'm struggling to get all the required (and only the required) information into the documentation of my Enterprise Architect Project. Precisely: we have modelled various requirements and displayed the source "standards" for these requirements in our diagrams by using the "hyperlink"-element out of the common toolbox. (This allows us to capture a title, the website where the documentation is found and a description of this documentation).
Now this element is visible on the diagram, but not in the package-view of our model and it does not get generated in our word (docx) documentation.
I can see that it should be possible to get this in the documentation, because a "Model Report" which basically prints everything does print the hyperlinks. But I can't find what I have to select in my template (in the package-tree view, as a package field, element field or diagram field) in order to get this printed. I can't just use the model report since this basically dumps the whole database in the document and reverse-engineering this model report has proven too difficult for me. Actually I would expect this to be in some kind of documentation for EA, but could not find such a thing with this level of detail... is there, is there a reproducible way of finding such things out in further cases? (btw I'm using EA 11.0)
[sorry there were illustrations here, but I'm not allowed to upload them...]
As Geert has already noted, there is a difference between "proper" elements and diagram-only elements. This is actually reflected in the document template editor, where there is an "Element" section inside the "Diagram" section. This will produce output for all elements in the diagram, whether or not they are also in the project browser.
Here's an example of the information you can pull out of your hyperlinks. Given a diagram with a hyperlink:
... and a template which outputs name, alias and hyperlink for each element in the diagram:
... EA will generate a document will the following contents:
So if you want the hyperlink to result in a hyperlink in the document, use the HyperlinkAlias field.
What might be a bit confusing is the fact that in addition to the Hyperlink element type in the Common diagram toolbox, EA allows you to create hyperlinks in regular elements (in the Element Properties dialog, Related tab: Files, which can be local files or web addresses).
In fact, I would recommend that you use those in your Requirement elements rather than diagram-only Hyperlinks if traceability is a priority in your model. The diagram-only Hyperlinks, on the other hand, give you a clearer visual.
Selecting a subset of the elements in a diagram ("only the required information") is a little more involved and depends on how your model is structured. Template fragments will get the job done, but you might be able to achieve your desired result by just using the filters in the document generation dialog.
The hyperlink is an element that is stored in the same package as the diagram it is used on, it is just not visible in the project browser (similar to a note element).
There's a good chance that it doesn't have a name, so make sure you don't omit nameless elements.
So if you print all the element of the package containing the diagram then you should be able to print the hyperlink as well.
In case that fails you might want to consider creating a template fragment based on an SQL query or a script. Those offer lots of flexibility to print whatever you need, even if it is located in a different package.
[Edited on 04.05.15 to reflect the comment by Uffe and provide a final solution]
Ok, based on Geerts answer, using the following custom query fragment in the diagram section:
select
t_object.ea_guid as CLASSGUID,
t_object.Object_Type as CLASSTYPE,
t_object.Object_Id as OBJECTID,
t_object.name as HL_Name,
t_object.Stereotype as HL_Stereotype,
t_object.object_type as HL_Type,
t_object.Alias as HL_Alias,
Note as Notes
--,t_object.*
from t_object
left join t_diagramobjects on (t_object.Object_ID = t_diagramobjects.Object_ID)
left join t_diagram on (t_diagram.Diagram_ID = t_diagramobjects.Diagram_ID)
where t_diagram.Diagram_ID = '#DIAGRAMID#'
and t_object.Object_Type='Text'
I was able to get a list of the hyperlinks following the diagram, this is the fragment:
custom >
{HL_Alias}: {HL_Name}
{Notes}
< custom
The "Notes" can be printed by getting the attribute directly out of the t_object table. Don't get confused as I was at first: the auto-completion on t_object and the results (t_object.*) DO NOT SHOW a Note-Attribute, but it does exist an when you write it into the query, it gets generated in the document.

Automatic test data generation

I need to prepare sample test data with 5 million rows of Different employees ie;
It should contain relevant information like -
First Name
Last Name
Address-1
Address-2
Zip code
st
county
country
...etc
Is there any tool that I can use to test it?
I have found the site http://www.generatedata.com/ to be good for this kind of thing - it has a bunch of different formats you can generate data in and outputs in a number of different formats that can be either read in by your code (e.g. from CSV) or easily translated into code using your favorite Unix text manipulation tools.
Either try a webservice, like:
http://www.generatedata.com/
http://www.mockaroo.com/
or try one of the following utils for fake data generation:
PHP "Faker" - https://github.com/fzaninotto/Faker
Perl's Data::Faker - http://metacpan.org/pod/Data::Faker
ruby "faker" - http://faker.rubyforge.org/
http://paulthedutchman.nl/datagenerator/
I would like to suggest a modern PHP fake data generator, with also the ability to fake an entity.
Fakerino: https://github.com/niklongstone/Fakerino

Apache OpenNLP: How do I implement a dictionary based entity recognition?

I have already downloaded the jar files to eclipse.
http://opennlp.apache.org/documentation/1.5.3/apidocs/opennlp-tools/index.html
How do I do the following:
1.) Be able to add my own names and tags.
2.) Be able to get the names and tags that were in the dictionary.
3.) Configure between case sensitive and insensitive.
For example, let's say, I add the name "Mike Smith" with name tag "Author".
If I have text that has that name, it should be able to recognize that its there along with the tag.
Please give actual java code!!!
I have asked a very similar question here:
Is it possible to conduct 'Context Analysis' for precise entity extraction with OpenNLP?
general concensus is that its 2 steps, first to identify if your sentence contains Author, the second to find the name.
I too would like to do it in 1 step (where the analysis of the corpus includes the words within itself as a way to determine the context of the name)

Lucene query that eliminates xml tags in full text search

In alfresco I need to write a lucene query such a way that It has to eliminate/exclude the xml tags from content while searching.
Example If a file try.xml is searched against the content, my search should not search for the xml tags.
try.xml
<sample>This is an example</sample>
If I give the search text as "sample" it should not return the file name "try.xml".
So how could I achieve this?
Edit
I have tried with the below query and no change.
#cm\:name:"try*" -TEXT:"<*>" +TEXT:"sample"
Whats wrong in the above query. I just tried to get the file name which starts with "try" and eliminating the text inside tag, and trying to search for text "sample".
By default Alfresco treats XML files as plain text and indexes the xml tags as words, that's why they can be found via full text search. XML content is handled by the StringExtractingContentTransformer in Alfresco which converts text/xml to text/plain before indexing it.
To check which transformers are registered in your Alfresco installation you can check
http://localhost:8080/alfresco/service/mimetypes?mimetype=text/xml#text/xml
To prevent the indexing of xml attributes you have to write a special transformer which strips out the XML tags. See http://wiki.alfresco.com/wiki/Content_Transformations for an introduction in content transformation with Alfresco. The easiest way would be to integrate a command line utility that converts the xml file into text or you could implement a java class which does the transformation.
There's no standard way to do what you need, here's an excerpt of the official documentation:
Wild card queries Wildcard queries
using * and ? are support as terms and
phrases. For tokenized fields the
pattern match can not be exact as all
the non token characters (whitespace,
punctuation, etc) will have been lost
and treated as equal.
Basically, angle brackets are stripped out by default. You need to hack the indexing and query parsing processes in order to enable your wanted behavior.
Could you not just exclude the xml mimetype? (See http://wiki.alfresco.com/wiki/Search#Finding_nodes_by_content_mimetype for the syntax)
I guess you might want to exclude html too (so you'd exclude text/html and text/xml), that'd prevent you getting any nodes in your results that contain xml tags.