can XInclude be used on stream input? - xinclude

I would like a portable solution to creating a multiply nested XML document using XInclude. I am using <xi:include href="foo.xml"> elements and taking the input from a stream. So far this fails (I am using XOM which has its own XIncluder) which reports it cannot find the base URL for the href. I am wondering if this is a general problem (see XercesDOMParser and XIncludes ). If so are there general workarounds?

A relative URI like foo.xml is useless without the context of a base URI with which to work out that e.g. if the base URI is http://example.net/bar/baz.xml then the absolute URI of the resource is http://example.net/bar/foo.xml.
This base URI can come from:
The URI that the XML in question came from (clearly not applicable to a stream alone).
A URI passed to a parser by mechanisms specific to it.
xml:base
Means specific to a given XML application (not advisable, but sometimes necessary for compatibility with other formats, e.g. the <base /> element in XHTML duplicates xml:base unnecessarily and with less flexibility, but is required for compatibility with HTML4.01 and earlier).

Related

How to do fo to pdf transform without accessing w3c?

Is it possible to do fo to pdf transform via java.xml.transform without accessing www.w3.org? There are, of course, references to this website in the schema, etc. Example:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:fox="http://xml.apache.org/fop/extensions">
Is there a way to move files to a local machine to avoid going to the w3 server? I know this isn't ideal, but the ip accessing w3 is currently getting http403 back so I need a temporary workaround while we address the larger problem. Thanks for any ideas.
The things that look like attributes that start with xmlns are namespace declarations. Neither your XSLT processor nor your XSL formatter will access the W3C servers because of the namespace declarations.
Namespaces are just a way to disambiguate elements that have the same local name. In your case, the http://www.w3.org/1999/XSL/Transform namespace URI lets the XSLT processor recognise the elements (and some attributes) that are significant for XSLT processing, and the http://www.w3.org/1999/XSL/Format namespace URI lets the XSL formatter recognise the elements that are defined by the XSL specification.
The URI that is the value of the namespace declaration doesn't need to be a resolvable URL. But for your XSLT and XSL-FO processing to work, they do have to be those URLs that you are using.
I would refer you to the Namespaces in XML specification, but I'm having a similar 403 problem myself. This tutorial from xml.com explains namespaces: https://www.xml.com/pub/a/1999/01/namespaces.html

Getting wrong page numbers in TOC via docx4j-export-fo

I'm using docxj4 for generating Word documents and now I need to generate a table of contents. Since 3.3.0 version docx4j uses plutext conversion service to get page numbers that is inappropriate for me, so I need to use docx4j-export-fo library for that purpose. But it produces the wrong numbering... Seems like it gets the wrong page size or something like this, because all page numbers are lag 1-2 numbers.
I've researched the source code and properties docx4j provides, but for now I didn't succeed.
As per the documentation, the standalone PDF Converter (which you can download from https://converter-eval.plutext.com/ ) exists precisely to provide better accuracy than can be expected from docx4j-export-fo.
export-fo uses XSL FO to layout the document, and because the XSL FO layout model is not a precise match for Word's, there are limits to what can be achieved.
That said, improvements may be possible in individual cases. You'd need to share your docx somewhere for specific feedback.

What's the point of using absolute urls in Pelican?

About RELATIVE_URLS, the Pelican docs say:
…there are currently two supported methods for URL formation: relative and absolute. Relative URLs are useful when testing locally, and absolute URLs are reliable and most useful when publishing.
(http://pelican.readthedocs.org/en/3.4.0/settings.html#url-settings)
But I'm confused why absolute URLs would be better or not. In general, when I write HTML by hand I prefer to use relative URLs because I can change the domain of the website and not worry about it later.
Can somebody explain the thinking behind this setting in more detail?
I don't use the RELATIVE_URLS setting because it's document-relative. I don't want URLs containing ../.. in them, which is often what happens when that setting is used.
Moreover, relative URLs can cause issues in Atom/RSS feeds, since all links in feeds must be absolute as per the respective feed standard specifications.
Contrary to what's implied in the original question, not using the RELATIVE_URLS setting will not cause any 404s if you later decide to change the domain. There's a difference between specifying absolute URLs in your source document (which is what you seem to be talking about) and having absolute URLs generated for you at build time (which is what Pelican does).
When it comes time to link to your own content, you can either use root-relative links, or you can use the intra-site link syntax that Pelican provides.

Dynamically adding routes in compojure

Hi guys : I have a "hierarchichal" styled site in compojure with a defroutes declaration like so :
(defroutes main-routes
(GET "/" [] (resp/redirect "/public/index.html")
(GET "/blog" [] (resp/redirect "/public/blogs/index.html")
(GET "/tools" [] (resp/redirect "/public/tools/index.html"))
I would like, however, for these pages to be more dynamic - that is, I would like the index.html page to be generated by scanning the contents of the /blog directory, and likewise, for the /tools route.
That is, in the end, I would like the routes to look like so :
(defroutes main-routes
(GET "/" [] (resp/redirect "/public/index.html")
(GET "/blog" [] (generate-index "/public/blog"))
(GET "/tools" [] (generate-index "/public/tools")))
Is there a simple roadmap for building dynamic paths through my site via compojure ?
More concretely ---- are there any suggestions on how to build a (generate-index) function which scans the inputted path and returns links to all files ? I assume that compojure might already have such a feature, given the recent rise of so many blogging platforms which are based on this type of idiom.
Doing most of what you said is fairly simple.
There are two things that you are going to want to look at in particular, as well as some general reading which will help you understand what's going on.
First, you are going to want to take a look at some form of HTML Templating tool. While it is possible to just build the necessary strings, things will be simpler if you use one. I've seen two different main styles for them, and which to chose depends on your tastes.
Hiccup is focused on taking Clojure data structures and transforming them into HTML
Enlive is focused on taking HTML template files and transforming them into the correct end form
For actually getting the list of files, consider using file-seq. Transform the file name into the appropriate post name and file, and then use that as data to generate the links to the pages.
The other thing you're going to want to learn more about is Compojure route templates and a little more on Ring Responses.
Compojure's route templates make it easy to pass in route parameters which you can then generate responses from. Following this is a simple example which serves a simple static html file using the html page name as the parameter.
(GET "/blog/:post" [post] (ring/file-response (str "/public/blogs/" post ".html")))
Finally, consider reading through the rest of the Compojure and Ring wikis. The Ring wiki gives some very good information on the core "how things work". The Compojure wiki provides some good examples on how to best make use of Compojure, which just focuses on providing an easy way - but far from the only way - to handle the routes and make the page generation for Ring easy.
Depending on where you want the site to go, I'd also consider taking a look at Noir, which is a framework that does a nice job at pulling together all the pieces and solving some common problems in the process.

MarkLogic facets on binary content

I ingested large binary into MarkLogic using the content ingestion framework, leaving the binary files on the file system, and I used the transformation to extract metadata from the images into properties. When I search on this content using the search API it does not return facets. I believe that this happens because the fragment returned contains the pointer to the image on the file system and not the properties document. Is there any way around this? I'd like to created faceted navigation base upon the properties.
If you take a look at the Search Developer's Guide for 5.0, section 2.2.6 talks about the fragment scope option that is new in 5.0, I think that will handle your case. There's an example in there showing how to create a facet on the last-modified property using a local fragment scope, and it sounds like that pattern might be what you're looking for.
If the search API doesn't handle this use-case, you could always call cts:element-values and cts:frequency yourself. You can still use search:parse and search:resolve to provide query parsing and basic search results.
http://docs.marklogic.com/5.0doc/docapp.xqy#search.xqy?start=1&cat=all&query=cts%3Aelement-values&button=search