Hierarchical data in StringTemplate - stringtemplate

I am trying to create a hierarchical document using StringTemplate.
For example, a list of directories:
\alpha
\file1
\file2
\beta
\file3
\gamma
\file4
\file5
\file6
Is this even possible with StringTemplate?

Simple example would be:
list.st
$it.title$
$it.children:list()$
page.st
$rootNode.children:list()$
For more information, you can read An introduction to StringTemplate (as well as Collections and Template Groups and Complex Data Types and Renderers in this trilogy). And don't forget official StringTemplate documentation.

Related

How to add keywords like “lateral view explode” in Babel parser

I want to parse a SQL statements (ANSI SQL or HiveQL) into equivalent AST. When I try to parse statements with “lateral view explode” keywords in it, which is a valid HiveQL syntax, Babel fails with ParseException. Adding these as keywords to the default list of keywords for Babel also does not help. Can someone point me to an example where something similar has been done.
Calcite does support the lateral keyword, but it does not support the "view explode" keywords.:
https://github.com/apache/calcite/blob/master/core/src/main/codegen/templates/Parser.jj#L2083
You could extend the parser, and might be able to use the free-marker support to skip the unsupported keywords ( I haven't tried it myself ):
https://calcite.apache.org/docs/adapter.html#extending-the-parser
However, if you need to access it through the corresponding SqlNode implementation, then it will require contribution given you would need to modify the core module.
more about parser.jj:
https://stackoverflow.com/a/44467850/1332098

Downloading all full-text articles in PMC and PubMed databases

According to one of the answered questions by NCBI Help Desk , we cannot "bulk-download" PubMed Central. However, can I use "NCBI E-utilities" to download all full-text papers in PMC database using Efetch or at least find all corresponding PMCids using Esearch in Entrez Programming Utilities? If yes, then how? If E-utilities cannot be used, is there any other way to download all full-text articles?
First of all, before you go downloading files in bulk, I highly recommend you read the E-utilities usage guidelines.
If you want full-text articles, you're going to want to limit your search to open access files. Furthermore, I suggest also restricting your search to Medline articles if you want articles that are any good. Then you can do the search.
Using Biopython, this gives us :
search_query = 'medline[sb] AND "open access"[filter]'
# getting search results for the query
search_results = Entrez.read(Entrez.esearch(db="pmc", term=search_query, retmax=10, usehistory="y"))
You can use the search function on the PMC website and it will display the generated query that you can copy/paste into your code.
Now that you've done the search, you can actually download the files :
handle = Entrez.efetch(db="pmc", rettype="full", retmode="xml", retstart=0, retmax=int(search_results["Count"]), webenv=search_results["WebEnv"], query_key=search_results["QueryKey"])
You might want to download in batches by changing retstart and retmax by variables in a loop in order to avoid flooding the servers.
If handle contains only one file, handle.read() contains the whole XML file as a string. If it contains more, the articles are contained in <article></article> nodes.
The full text is only available in XML, and the default parser available in pubmed doesn't handle XML namespaces, so you're going to be on your own with ElementTree (or an other parser) to parse your XML.
Here, the articles are found thanks to the internal history of E-utilities, which is accessed with the webenv argument and enabled thanks to the usehistory="y" argument in Entrez.read()
A few tips about XML parsing with ElementTree : You can't delete a grandchild node, so you're probably going to want to delete some nodes recursively. node.text returns the text in node, but only up to the first child, so you'll need to do something along the lines of "".join(node.itertext()) if you want to get all the text in a given node.
According to one of the answered questions by NCBI Help Desk , we cannot "bulk-download" PubMed Central.
https://www.nlm.nih.gov/bsd/medline.html + https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/ will download a good portion of it (I don't know the percentage). It will indeed miss the PMC full-texts articles whose license doesn't allow redistribution as explained on https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/.

Avoid duplicates in the destination schema

I have a little problem. I want to map every detail line to one OrderInfo. The destination schema can not have any duplicate OrderInfo. All the detail lines should be in the destination orderInfo, but the SuppliersOrderNo and BuyersOrderNo should not be twice.
Any ideas how to do this, is it possible to use XSL or inline script?
<inv:OrderInfo>
<inv:SuppliersOrderNo>123456</inv:SuppliersOrderNo>
<inv:BuyersOrderNo>6789</inv:BuyersOrderNo>
<inv:DetailLines>
<inv:DetailLine>
<inv:InvoiceDetailLineNo>1</inv:InvoiceDetailLineNo>
<inv:Item>
<inv:SuppliersArticleNo>article2</inv:SuppliersArticleNo>
<inv:SuppliersDescription>BestArticle</inv:SuppliersDescription>
</inv:Item>
</inv:DetailLine>
<inv:DetailLine>
<inv:InvoiceDetailLineNo>2</inv:InvoiceDetailLineNo>
<inv:Item>
<inv:SuppliersArticleNo>article3</inv:SuppliersArticleNo>
<inv:SuppliersDescription>AlmostBestArticle</inv:SuppliersDescription>
</inv:Item>
</inv:DetailLine>
</inv:DetailLines>
</inv:OrderInfo>
<inv:OrderInfo>
<inv:SuppliersOrderNo>123456</inv:SuppliersOrderNo>
<inv:BuyersOrderNo>6789</inv:BuyersOrderNo>
<inv:DetailLines>
<inv:DetailLine>
<inv:InvoiceDetailLineNo>1</inv:InvoiceDetailLineNo>
<inv:Item>
<inv:SuppliersArticleNo>article1337</inv:SuppliersArticleNo>
<inv:SuppliersDescription>WOW</inv:SuppliersDescription>
</inv:Item>
</inv:DetailLine>
</inv:DetailLines>
</inv:OrderInfo>
If you want to do this purely in XSLT, you'll have to use Muenchian gruoping. I wrote a blog that links to some other blogs on how to do this in BizTalk a little while back: https://blog.tallan.com/2014/12/09/muenchian-grouping-in-biztalk-while-keeping-mapper-functionality/
To summarize the blog: if you pursue this, you'll need a map that's completely custom XSLT somewhere, but you could put it into a custom pipeline component if you still want to be able to use "regular" maps functionality without any other caveats (in my blog I describe a method of doing that in a pipeline component so that a "regular" BizTalk map can still be used on the preprocessed output). There are lots of resources on Muenchian grouping out there (including on StackOverflow), so I'm not rehashing all of that in this answer.
You could also try to serialize the message in a C# component and use some LINQ methods to group/sort/order/etc, or if you're inserting the content into SQL at some point you could do it in SQL (which would be able to handle this kind of task more naturally).

Documentation with Diagram "Hyperlinks" in Enterprise Architect?

I'm struggling to get all the required (and only the required) information into the documentation of my Enterprise Architect Project. Precisely: we have modelled various requirements and displayed the source "standards" for these requirements in our diagrams by using the "hyperlink"-element out of the common toolbox. (This allows us to capture a title, the website where the documentation is found and a description of this documentation).
Now this element is visible on the diagram, but not in the package-view of our model and it does not get generated in our word (docx) documentation.
I can see that it should be possible to get this in the documentation, because a "Model Report" which basically prints everything does print the hyperlinks. But I can't find what I have to select in my template (in the package-tree view, as a package field, element field or diagram field) in order to get this printed. I can't just use the model report since this basically dumps the whole database in the document and reverse-engineering this model report has proven too difficult for me. Actually I would expect this to be in some kind of documentation for EA, but could not find such a thing with this level of detail... is there, is there a reproducible way of finding such things out in further cases? (btw I'm using EA 11.0)
[sorry there were illustrations here, but I'm not allowed to upload them...]
As Geert has already noted, there is a difference between "proper" elements and diagram-only elements. This is actually reflected in the document template editor, where there is an "Element" section inside the "Diagram" section. This will produce output for all elements in the diagram, whether or not they are also in the project browser.
Here's an example of the information you can pull out of your hyperlinks. Given a diagram with a hyperlink:
... and a template which outputs name, alias and hyperlink for each element in the diagram:
... EA will generate a document will the following contents:
So if you want the hyperlink to result in a hyperlink in the document, use the HyperlinkAlias field.
What might be a bit confusing is the fact that in addition to the Hyperlink element type in the Common diagram toolbox, EA allows you to create hyperlinks in regular elements (in the Element Properties dialog, Related tab: Files, which can be local files or web addresses).
In fact, I would recommend that you use those in your Requirement elements rather than diagram-only Hyperlinks if traceability is a priority in your model. The diagram-only Hyperlinks, on the other hand, give you a clearer visual.
Selecting a subset of the elements in a diagram ("only the required information") is a little more involved and depends on how your model is structured. Template fragments will get the job done, but you might be able to achieve your desired result by just using the filters in the document generation dialog.
The hyperlink is an element that is stored in the same package as the diagram it is used on, it is just not visible in the project browser (similar to a note element).
There's a good chance that it doesn't have a name, so make sure you don't omit nameless elements.
So if you print all the element of the package containing the diagram then you should be able to print the hyperlink as well.
In case that fails you might want to consider creating a template fragment based on an SQL query or a script. Those offer lots of flexibility to print whatever you need, even if it is located in a different package.
[Edited on 04.05.15 to reflect the comment by Uffe and provide a final solution]
Ok, based on Geerts answer, using the following custom query fragment in the diagram section:
select
t_object.ea_guid as CLASSGUID,
t_object.Object_Type as CLASSTYPE,
t_object.Object_Id as OBJECTID,
t_object.name as HL_Name,
t_object.Stereotype as HL_Stereotype,
t_object.object_type as HL_Type,
t_object.Alias as HL_Alias,
Note as Notes
--,t_object.*
from t_object
left join t_diagramobjects on (t_object.Object_ID = t_diagramobjects.Object_ID)
left join t_diagram on (t_diagram.Diagram_ID = t_diagramobjects.Diagram_ID)
where t_diagram.Diagram_ID = '#DIAGRAMID#'
and t_object.Object_Type='Text'
I was able to get a list of the hyperlinks following the diagram, this is the fragment:
custom >
{HL_Alias}: {HL_Name}
{Notes}
< custom
The "Notes" can be printed by getting the attribute directly out of the t_object table. Don't get confused as I was at first: the auto-completion on t_object and the results (t_object.*) DO NOT SHOW a Note-Attribute, but it does exist an when you write it into the query, it gets generated in the document.

XSLT vs. XQuery

I am new to those two technologies, I sketched their roles in generating an HTML out of raw XML file as I understood in these steps(Please correct me if I was wrong):
XML data source (database, RSS, ...)
XQuery (Data manipulation FLWR)
XSLT (Data representation through templating)
The resulting XHTML document to be delivered
I am wondering about the technical details of using them, to be specific, here are the questions:
How to implement XQuery in a PHP web server (I am using WAMP suite).
How can I request .xq page (can I do that directly, or should I use a CGI to do that?)
How can I pass the resulting XML page from XQuery call to XSLT for templating?
Could you give me some pointers the development environment to create a website using these technologies, thanks.
-- Update: I understand now that difference between XQuery and XSLT is a difference in point of view since two different working groups are maintaining them, both will do the job though in different approaches.
I am using XSLT only for both data operations and representation, I am implementing structured templating approach which is found here XSLT Abstractions in order to organize the work a little bit.
I have a system that works along the lines you describe. It runs like this;
Inputs
The XML data is a plain text file eg. "data.xml".
The XSL stylesheet is a plain text file eg. "style.xsl".
The xquery is a plain text file eg. "test.xq".
An xquery processor is running as a service on port 2409. (More about this below.)
Flow
A PHP script eg. "index.php" runs. It contacts the xquery processor like this;
$xml = file_get_contents("http://localhost:2409/test.xq");
The test.xq query is executed by the xquery processor. The test.xq query uses the doc function to load the data;
declare variable $root := doc("data.xml");
When test.xq finishes, the result is returned by the xquery processor to index.php.
Back in index.php, $xml now contains the result of the test.xq xquery. An XSLT processor is invoked to transform the XML into XHTML. The PHP code is something like;
$doc = new DOMDocument();
$doc->loadXML($xml);
$stylesheet = new DOMDocument();
$stylesheet->load("style.xsl");
$processor = new XSLTProcessor();
$processor->importStylesheet($stylesheet);
$xhtml = $processor->transformToXML($doc);
echo $xhtml;
The only part of all that which is not achievable using standard components is the xquery processor. I had to write that bit using a Java servlet to invoke the Saxon xquery processor. Both Java and Saxon are free but it still took a lot of learning to get it working.
You can see it working here.
I like this technique because a) it separates logic from presentation and b) it runs fast.