converting xsl-fo to pdf middle dot encoding is not recognized - pdf

I am looking for a way to print a list with middle dot.
I am using an xsl file to convert it in a pdf file.
Below is a sample of my xsl file which lead to an unrecognized middle dot in my pdf file :
<?xml version="1.0" encoding="utf-8"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format" font-family="Calibri" >
<fo:layout-master-set>
<fo:simple-page-master master-name="FirstPage" margin-top="0.5cm">
<fo:region-body margin-top="6cm" margin-bottom="1.5cm" margin-left="2cm" margin-right="2cm" />
<fo:region-before extent="1cm" region-name="first-page-header"/>
<fo:region-after extent="1cm" region-name="first-page-footer" padding-left="1cm"/>
</fo:simple-page-master>
</fo:layout-master-set>
<fo:flow flow-name="xsl-region-body">
<fo:block>
<fo:inline font-family="Symbol">·</fo:inline>
</fo:block>
</fo:flow>
I have tried to include this symbol with "&#183", "&#xb7" and "& middot" but it does not succed.
I also tried to add encoding-mode="single-byte" next to font-family="Calibri" but still no success.
Any idea why it does not work ?

Drop the font-family="Symbol". At present, you are using the Symbol font for the · and not using Calibri, so changing the configuration for Calibri isn't going to change anything.
According to 'Character Map' on Windows 7, Symbol does not have a character at 0xB7 and Calibri has Middle Dot at 0xB7.
(If the purpose of the fo:inline is just to change the font, then you can also drop the fo:inline.)

Related

xsl:fo Increment a variable inside page sequences?

I have a several pages sequences in my xsl file. An xsl-template is called inside each page sequence. Inside each template I have a block that contains a variable that I need to be incremented if the block is executed....I tried to use a global variable but I found in many posts here we cannot increment a global variable in xsl-fo...Can SomeOne please guides me How to do that ?
My xsl-file is something like this:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format" version="1.0"
xmlns:pdf="http://xmlgraphics.apache.org/fop/extensions/pdf">
<xsl:output encoding="UTF-8" indent="yes" method="xml"
standalone="no" omit-xml-declaration="no" />
<xsl:template match="analyseData">
<fo:page-sequence master-reference="simpleA6">
<fo:flow flow-name="xsl-region-body" border-collapse="collapse">
<xsl:call-template name="template1" />
</fo:flow>
</fo:page-sequence>
<fo:page-sequence master-reference="simpleA6">
<fo:flow flow-name="xsl-region-body" border-collapse="collapse">
<xsl:call-template name="template2" />
</fo:flow>
</fo:page-sequence>
<fo:page-sequence master-reference="simpleA6">
<fo:flow flow-name="xsl-region-body" border-collapse="collapse">
<xsl:call-template name="template3" />
</fo:flow>
</fo:page-sequence>
</xsl:template>
If you can come up with a pattern for all of the nodes in the source that need to be numbered and the numbering sequence is in the document order of the nodes in the source document, then you could use <xsl:number match="any" count="..." /> to do the counting. See https://www.w3.org/TR/xslt20/#numbering-based-on-position
If the count sequence doesn't match your source document or you can't find a pattern, then you're probably back to post-processing, as #kevin-brown suggests.
Well, there is very limited information here but I could guess.
Nothing stops you from using XSL and an identity-transform to modify some interim result.
So you could do what you are doing. Whenever you need to output this counter, why not write to the output <counter/>. Nothing more, just an empty tag that represents the counter.
Then write an identity-translate XSL that outputs the resulting file as is, except for a match on <counter>. This template would replace it with:
<fo:inline><xsl:value-of select="count(preceding::counter) + 1"/></fo:inline>
Note: you could maybe also use <xsl:number> here.
So you would do:
XML+XSL -> XSL FO with counters + XSL identity change counters -> XSL FO -> format with your formatter

Data-driven titles in PDFs from Julia using Taro and Mustache

I have some test demographic data in the following form (truncated for demo purposes):
TreatmentArm Site-Subject Gender Age
Placebo 000001-000002 M 42
Placebo 000001-000043 F 23
Placebo 000003-000076 F 45
.
.
Active 000001-000003 M 56
Active 000003-000098 F 34
I can produce a PDF with headers, footers and a table showing the data in the above structure. However, the treatment arm repeats are unnecessary and would normally be handled by sub-titles:
Treatment Arm: Placebo
Site-Subject Gender Age
000001-000002 M 42
000001-000043 F 23
000003-000076 F 45
<page-break>
Treatment Arm: Active
Site-Subject Gender Age
000001-000003 M 56
000003-000098 F 34
So, the two values of treatment arm are controlling the text in the sub-title and the change of the treatment arm value is triggering a page-throw.
The language I'm trying here is Julia, which has, so far, acquitted itself very well. In order to write a simple report to a PDF I need to use a package called Taro, which also uses a port of Mustache.js to Julia. Here, Julia is calling Java, which uses Apache FOP to produce the PDF. Julia calls an XSL-FO template and the Mustache render function marries the data to the template.
Hence, there are 2 source files: the Julia program and the XSL-FO template files. First, the Julia source, abridged as far as possible:
using Taro
# init() once per session to set the Java classpath
Taro.init()
using Mustache
using DataFrames
# get the xsl-fo template
tmpl = Mustache.template_from_file("tables.fo.tmpl")
# get the data, process, sort and select columns
df = readtable("DM1.csv")
df[:sitesubj] = map(x->x[8:end], df[:usubjid])
df2 = sort(df[:, [:armcd, :arm, :sitesubj, :age, :sex]], cols = [:armcd, :sitesubj])
# Write the data to an Array of Dictionaries
d=Array(Dict, nrow(df2));
for i in 1:length(d)
d[i] = Dict{ASCIIString,Any}(
"armcd"=>df2[i, :armcd],
"arm"=>df2[i, :arm],
"sitesubj"=>df2[i, :sitesubj],
"age"=>df2[i, :age],
"sex"=>df2[i, :sex],
)
end
# Some Mustache magic. Render adds the data to the report template
# tn is a String, to is an IOStream
tn, to=mktemp()
fo=render(tmpl, D=d)
write(to, fo)
close(to)
Taro.fo(tn, "test_listing.pdf")
And now the template, abridged as far as possible, but leaving a working example, albeit without the sub-title I need:
<?xml version="1.0" encoding="UTF-8"?>
<fo:root font-family="Courier" font-size="10pt" xmlns:fo="http://www.w3.org/1999/XSL/Format">
<fo:layout-master-set>
<fo:simple-page-master master-name="A4-landscape"
margin-right="0.5cm"
margin-left="0.5cm"
margin-bottom="0.5cm"
margin-top="0.5cm"
page-width="29.7cm"
page-height="21cm">
<fo:region-body margin-top="4cm" margin-bottom="3cm"/>
<fo:region-before extent="8cm"/>
<fo:region-after extent="3cm"/>
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="A4-landscape">
<!-- Headers -->
<fo:static-content flow-name="xsl-region-before">
<fo:block line-height="14pt" font-size="8pt" text-align-last="justify">ACME Corp
<fo:leader leader-pattern="space" />
CONFIDENTIAL
</fo:block>
<fo:block line-height="14pt" font-size="8pt" text-align-last="justify">XYZ123 / Anti-Hypertensive
<fo:leader leader-pattern="space" />
Draft
</fo:block>
<fo:block line-height="14pt" font-size="8pt" text-align="left">Protocol XYZ123</fo:block>
<fo:block line-height="14pt" font-size="8pt" text-align="center">Study XYZ123</fo:block>
<fo:block line-height="14pt" font-size="8pt" text-align="center">Listing of Demographic Data by Treatment Arm</fo:block>
<fo:block line-height="14pt" font-size="8pt" text-align="center">All Subjects</fo:block>
<fo:block text-align="left">
<!-- Here is where I need to add the current ARM value in a sub-title -->
<!-- fo:retrieve-marker ?? -->
</fo:block>
</fo:static-content>
<!-- Footers -->
<fo:static-content flow-name="xsl-region-after">
<fo:block line-height="14pt" font-size="8pt" text-align="left">A long explanatory text</fo:block>
<fo:block line-height="14pt" font-size="8pt" text-align="left">All subjects are included in the listing including the screen failures</fo:block>
<fo:block line-height="14pt" font-size="8pt" text-align="left">All measurements were taken at the screening visit</fo:block>
<fo:block line-height="14pt" font-size="8pt" text-align-last="left"> Page <fo:page-number/> of <fo:page-number-citation ref-id="end"/>
</fo:block>
</fo:static-content>
<fo:flow flow-name="xsl-region-body">
<!-- Here I need to capture the value of the current arm -->
<!-- Set a marker ?? -->
<!-- Cannot use {{#:D}} to {{/:D}} as this captures values across all rows -->
<fo:table table-layout="fixed" width="100%" >
<fo:table-column column-width="2cm"/>
<fo:table-column column-width="6cm"/>
<fo:table-column column-width="2cm"/>
<fo:table-column column-width="3cm"/>
<fo:table-header border-bottom-style="solid" border-top-style="solid">
<fo:table-row space-after="10px">
<fo:table-cell>
<fo:block>Arm</fo:block>
</fo:table-cell>
<fo:table-cell>
<fo:block>Site ID - Subject ID</fo:block>
</fo:table-cell>
<fo:table-cell>
<fo:block>Age</fo:block>
</fo:table-cell>
<fo:table-cell>
<fo:block>Gender</fo:block>
</fo:table-cell>
</fo:table-row>
</fo:table-header>
<fo:table-body border-bottom-style="solid">
{{#:D}}
<fo:table-row keep-together.within-page="always">
<fo:table-cell>
<fo:block>{{arm}}</fo:block>
</fo:table-cell>
<fo:table-cell>
<fo:block>{{sitesubj}}</fo:block>
</fo:table-cell>
<fo:table-cell>
<fo:block>{{age}}</fo:block>
</fo:table-cell>
<fo:table-cell>
<fo:block>{{sex}}</fo:block>
</fo:table-cell>
</fo:table-row>
{{/:D}}
</fo:table-body>
</fo:table>
<fo:block id="end"/>
</fo:flow>
</fo:page-sequence>
</fo:root>
One of the problems here is the mixture of technologies and where to address the problem. Do I need to pre-summarize in Julia and pass another dictionary with just the 'Placebo' and 'Active' values to the template. Even so, there must be some mechanism to recognise position within the template. I don't think it possible to add XSL directives to the mix, so logic within the template eludes me. As the comments in the XSL-FO file suggest, maybe the way is to set and retrieve markers, but the recognition of the 'first row in a BY group' does not seem to be present. I hope I am wrong and this is possible.
The paging issue seems to have been solved by
<fo:table-row keep-together.within-page="always">
But this is not like saying 'when this condition is reached, throw a page.'
So, if anybody has suggestions, I'm more than happy to test them. Many thanks.
Since Mustache is, by design, 'logic-less', I would implement this by summarising in julia. Pass in an outer array. Each row in that array has two columns. The first column contains the "Arm", and the second column contains another array, containing the data for that "Arm".
In addition, the subheader should not be in static-content. It should be part of the flow content. Am I missing something here?
These changes, combined with keep-together should give you want you need.

Apache fop generated PDF renders hyperlinks by replacing accented characters with '?'

I have used Apache FOP 1.1 to programmatically generate PDF. The PDF is supposed to contain name of a document as hyperlink. When I click on the name, it should open the corresponding file. Here is the code:
<fo:block>
<fo:basic-link color="blue" show-destination="new">
<xsl:attribute name="external-destination">
<xsl:choose>
<xsl:when test="#parentFolderPath">
<xsl:value-of select="#parentFolderPath" />/<xsl:value-of select="#FileName" />
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="#FileName" />
</xsl:otherwise>
</xsl:choose>
</xsl:attribute><xsl:call-template name="writeWithoutOverlap"><xsl:with-param name="data" select="#FileName"/></xsl:call-template>
</fo:basic-link>
</fo:block>
This works perfectly fine when I have a file having English characters in the name. However, when I have a file name like this: "étudiant où forêt naïve garçon.docx" the hyperlink formed in the PDF replaces the accented characters with '?'.
This is a screen shot of the PDF where you can see the malformed hyperlink:
I am using "Arial" font and encoding="UTF-8".
When the name of the file is getting printed correctly, why is the hyperlink giving a problem?

XSLT new lines not being preserved

For some reason my spaces aren't being preserved in my final PDF after xslt. My desired output is:
Static
text bold.
Here's my xslt template:
<xsl:preserve-space elements="*" />
<xsl:strip-space elements="" />
<xsl:template match="coverPage">
<fo:block font-size="12pt" color="black" text-align="center">
<xsl:text>
Static
text
</xsl:text>
</fo:block>
<fo:block font-size="12pt" color="black" text-align="center" font-weight="bold">
<xsl:text>
bold.
</xsl:text>
</fo:block>
</xsl:template>
I think there are a few issues with your XSLT:
there is no need to enclose the text inside xsl:text elements, as those text nodes are not composed only of whitespace characters and therefore will never be stripped (see Whitespace stripping for more details)
for the same reason, there is no need to use xsl:preserve-space and xsl:strip-space, unless of course you need them for other reasons
preserving linefeeds in the transformation from XML to XSL-FO is just the (required) first step, but then you must preserve them during the processing of the XSL-FO file; in order to do this, you must use the linefeed-treatment property: linefeed-treatment="preserve"
a literal linefeed is equivalent to a
entity, so in your input you have 3 linefeeds between "Static" and "text", which will produce two empty lines when preserved; if that's not what you want, you have to remove some of them
the words "text" and "bold" are inside two different fo:block elements, so this means they will always be on different lines; if you want them to be placed one beside the other, those words must be inside fo:inline elements instead (and there must be an outer fo:block to contain them)
A final word of warning
While looking at an FO file the difference between a preserved linefeed and an ignored one is not immediately apparent, as it boils down to the presence of the linefeed-treatment attribute in an ancestor element (which could be quite far from the text node itself).
Clearer ways to force a line break in a specific position include:
using different fo:block elements, each one containing the text that should create a line (or several ones)
<fo:block>Static</fo:block>
<fo:block>text <fo:inline font-weight="bold">bold.</fo:inline></fo:block>
using an empty fo:block where a line break should be
<fo:block>
Static
<fo:block/>
text <fo:inline font-weight="bold">bold.</fo:inline>
</fo:block>

SVG to PDF converter that preserves text

I'm looking for a SVG to PDF converter that preserves the text in the SVG. I've tried Batik, Inkscape, and CairoSVG. The PDF generated by all of them is a bitmap image, including the text; the text cannot be selected/searched in a PDF viewer. All of them don't do a great job either, especially CairoSVG.
I followed the directions here (note that you don't have to compile FOP - you can download the PDF transcoder from here). Now I see that if I zoom into the PDF, the clarity is preserved, which I assume means the text is preserved. However, I cannot search or select the text.
Also, I compared the output of using PDF transcoder from FOP versus what's in Batik, and I see no difference.
If you're using filters, gradients or masking, it might be that it's impossible to translate this 1:1 to PDF. In these cases, converters usually raster the vector data to achieve a similar visual appearance instead of preserving the vector data and get a very different look.
Edit: In your example case, we can make sure that fill attributes are used instead of filters with the help of the following XSLT transformation:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0" xmlns="http://www.w3.org/2000/svg" xmlns:svg="http://www.w3.org/2000/svg">
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="#fill[ancestor::svg:symbol]" priority="1">
<xsl:attribute name="fill">currentColor</xsl:attribute>
</xsl:template>
<xsl:template match="#filter[starts-with(.,'url(#colorFilter-')]">
<xsl:attribute name="color">
<xsl:value-of select="concat('#',substring(.,18,6))"/>
</xsl:attribute>
</xsl:template>
<xsl:template match="svg:use[not(#filter)]">
<xsl:copy>
<xsl:attribute name="color">#fff</xsl:attribute>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
This fully relies on how in this particular SVG the filters are named, so it's not applicable to anything else. The colors aren't quite right, though. I'd be very interested in learning why this color matrix:
0.4 0 0 0 0
0 0.6 0 0 0
0 0 0.8 0 0
0 0 0 1 0
applied to white obviously does not result in rgba(40%,60%,80%,1).
Have a look at rsvg-convert, part of librsvg. I have used it to convert SVG documents to PDF and it preserves text such that it is selectable and searchable in PDF viewers.
Here is a blog post comparing it to some other options, and showing how to use it: https://www.itsfullofstars.de/tag/rsvg-convert/
HAve you tried printing the SVG to a PDF printer?