Convert XSD to JSON - jackson

I have a requirement like I have an XSD or XML file and I generated POJO's from XSD or XML with XJC(Compiler). My XML file has a data like for example Person is the root element and he has list of addresses. So when I generate POJO's I can get list addresses from that POJO.
Now My requirement is I will generate POJO's, So when I call getAddresses() I want a JSON String instead of List of Strings. Is it possible ......??????
One of my friend said it is possible with Jackson annotations. But how can I start with that..?
If it possible Can anyone provide some sample code snippet or point me to in right direction.

Related

How to get all the data from a DICOM file with Imebra

I am working on a project that integrates Imebra inside an android application. The application is supposed to extract all the data from a given DICOM file and put them into a .xml file. I need a little bit of help with it. For example, I don't know how to get all the VR tags that the given DICOM has, instead of getting them one by one using tag ids.
Thank you for your help.
Load the file using CodecFactory.load(filename).
Then you can use DataSet.getTags() to retrieve a list of tags stored into the DICOM structure.
The returned class TagsIds is a list containing all the TagId: scan each tag ID and retrieve it via DataSet.getString() (to retrieve the value as string) and DataSet.getDataType() to retrieve its VR.
When DataSet.getString() fails then you are dealing with a sequence (an embedded DICOM structure) which can be retrieved with DataSet.getSequenceItem().
You can use the static method DicomDictionary.getTagName() to get a description of a particular tag.

cTAKES UMLS ICD10 codes lookup

I created a cTAKES custom dictionary from UMLS database with ICD10 codes.
Right now I able to analyze the text by for example disease name, like Asthma and annotation index will contain the ICD10 code for this matching code = "J45.90".
Is it possible to configure cTAKES in order to reverse this process in order to look for ICD10 code appearance in the text instead?
The XML output contains the start and ends of a matched concept in the original corpus. I personally find it easier to convert the XML to a simple JSON format and then loop through it as needed.
I have been working on an open source solution for parsing out the data and displaying the corpus with the matches it in HTML: https://github.com/GoTeamEpsilon/ctakes-friendly-web-ui#demonstration - let me know if you'd like to contribute.

Conditionally, Converting of JSON to XML using MuleSoft

I have a simple conversion of JSON to XML using MuleSoft. In "Transform Message" component, I provided JSON Schema as Input and XML Schema as Output. When I run the app, the conversion happens if the file matches with both schema but it generates an empty XML file if it doesn't match.
I want below conditions:
1) If the file matches with schema, the converted output file should be sent to converted folder and the original file should move to Success folder.
2) If the file doesn't match with schema, the original file should move to the Failure folder instead of conversion.
Hope, I explained it comprehensively as I am new to MuleSoft. Here is a sample diagram which may simplify my requirement. Provide me with a new one if I badly designed the process.
First thing you need to create a flowVar that will hold your original payload.
When your doing your evaluation, if its XML then use a simple XPath expression like //elementName[not(node())]
Lastly, on your success use scatter-gather for multi-threading write. Pull your original payload from flowVar and write to Success and Write your regular payload to your Converted folder

mapping a string containing xml in BizTalk

I have an xml document with a node that may optionally contain a string of escaped xml. I'd like to be able to transform that content using xsl in a BizTalk map. Any suggestion how?
I've tried:
msxsl:node-set(string). This creates a nameless single node with no content.
The document() function using a url prefix of 'data:text/xml' as suggested by helderdarocha here.
for-each selecting the text() of the node containing the string
using xpath() in an orchestration to extract the string then make a multipart message. It won't let me use an xmlDocument message as one of the messages in a multipart message transform.
Do I have to use a C# helper assembly to accomplish this?
I have tackled a similar issue in a project, where I have a series of 2 mappings (both native xslt).
The first map will map your input document to an intermediate format. This format has one "any" node (instead of the escaped XML node), where eventually, I put in the unescaped XML. I unescape using a C# extension object.
The C# code could just be a wrapper for System.Web.HttpUtility.HtmlDecode()
In the second mapping, you can map using plain XPath.
Example Input message:
<root>
<someNode>blabla</someNode>
<any><root2><myValue>escapedXml</myValue></root2></any>
</root>
Intermediate format:
<root>
<someNode>blabla</someNode>
<any>
<root2>
<myValue>escapedXml</myValue>
</root2>
</any>
</root>
In your second mapping, you could use XPaths like /root/any/root2/myValue/text() without any issue.
Important Note:
If you need to do XSD validation against this intermediate format, this is a good way to do this as well. You would just need to create the appropriate intermediate XSD according to your needs. In my case this was needed, so I had to validate this unescaped format using a receive pipeline execution in an orchestration.

Indexing Multiple documents and mapping to unique solr id

My use case is to index 2 files: metadata file and a binary PDF file to a unique solr id. Metadata file has content in form of XML file and some schema fields are mapped to elements in that XML file.
What I do: Extract content from PDF files(using pdftotext), process that content and retrieve specific information(example: PDF's first page/line has information about the medicine, research stage). Information retrieved(medicine/research stage) needs to be indexed and one should be able to search/sort/facet.
I can create a XML file with information retrieved(lets call this as metadata file). Now assuming my schema would be
<field name="medicine" type="text" stored="true" indexed="true"/>
<field name="researchStage". ../>
Is there a way to put this metadata file and the PDF file in Solr?
What I have tried:
Based on a suggestion in archives, I zipped these files and gave to ExtractRequestHandler. I was able to put all the content in SOLR and make it searchable. But it appear as content of zip file.(I had to apply some patches to Solr Code base to make this work). But this is not sufficient as the content in metadata file is not mapped to field names.
curl "http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true" -F "myfile=#file.zip"
I tried to work with DataImportHandler(binURLdatasource). But I don't think I understand how it works. So could not go far.
I thought of adding metadata tags to PDF itself. For this to work, ExtractrequestHandler should process this metadata. I am not sure of that either.
So I tried "pdftk" to add metadata. Was not able to add custom tags to it. It only updates/adds title/author/keywords etc. Does anyone know similar unix tool.
If someone has tips, please share.
I want to avoid creating 1 file(by merging PDF text + metadata file).
Given a file record1234.pdf and metadata like:
<metadata>
<field1>value1</field1>
<field2>value2</field2>
<field3>value3</field3>
</metadata>
Do the programmatic equivalent of
curl "http://localhost:8983/solr/update/extract?
literal.id=record1234.pdf
&literal.field1=value1
&literal.field2=value2
&literal.field3=value3
&captureAttr=true&defaultField=text&capture=div&fmap.div=foo_txt&boost.foo_txt=3&" -F "tutorial=#tutorial.pdf"
Adapted from http://wiki.apache.org/solr/ExtractingRequestHandler#Literals .
This will create a new entry in the index containing the text output from Tika/Solr CEL as well as the fields you specify.
You should be able to perform these operations in your favorite language.
the content in metadata file is not mapped to field names
If they dont map to a predefined field, then use dynamic fields. For example you can set a *_i to be an integer field.
I want to avoid creating 1 file(by merging PDF text + metadata file).
That looks like programmer fatigue :-) But, do you have a good reason?