I have xml which is processed by JDOM library. One part of xml is:
<fo:inline xmlns:fo="http://www.w3.org/1999/XSL/Format">Mytitle</fo:inline>
In output pdf 'Mytitle' is instead shown as 'MYTITLE'.
It's a legacy code and I don't have much idea of it.
So what can be the reason for this change?
I'm pretty sure it's not JDOM that is causing this. An XSL_FO processor (maybe this) is transforming the XML document to PDF.
Related
I'm using docxj4 for generating Word documents and now I need to generate a table of contents. Since 3.3.0 version docx4j uses plutext conversion service to get page numbers that is inappropriate for me, so I need to use docx4j-export-fo library for that purpose. But it produces the wrong numbering... Seems like it gets the wrong page size or something like this, because all page numbers are lag 1-2 numbers.
I've researched the source code and properties docx4j provides, but for now I didn't succeed.
As per the documentation, the standalone PDF Converter (which you can download from https://converter-eval.plutext.com/ ) exists precisely to provide better accuracy than can be expected from docx4j-export-fo.
export-fo uses XSL FO to layout the document, and because the XSL FO layout model is not a precise match for Word's, there are limits to what can be achieved.
That said, improvements may be possible in individual cases. You'd need to share your docx somewhere for specific feedback.
Is it possible to convert PDF to TIFF file using XSLT? Can someone point out some artcile or code i can refer regarding the image conversion using xslt.
THANKS!
No, it is not possible using just XSLT. XSLT is for transforming XML to other textual structures (usually XML, HTML, or plain text). Using XSL-FO, you can output a PDF from XML data - but that is a one way process as far as XSL-FO is concerned. Apache FOP does support outputting to TIFF instead of PDF, but again this is a one way process.
Assuming you could get a PDF -> XML conversion working (a quick google suggests such libraries exist, but it's unclear what they'd actually provide), it would be possible to use XSLT to transform that XML into something Apache FOP could render into a TIFF file, but at that point you'd really be better off investigating a direct PDF to TIFF conversion library (perhaps with an OCR library).
Possible? Maybe (but likely not). The real question is why do you even want to try to create a TIFF file from a PDF file using XSLT?
You do not need XSLT.
You want a raster image processor like Ghostscript (or many others). It can convert PDF (and Postscript) to other image formats like TIFF.
http://ghostscript.com/doc/current/Devices.htm
The only way to do that is to call a conversion service, e.g. aspose.com or to create another service externally to the DataPower box.
There might be some Node.js modules that could do it running in GatewayScript (GWS) (if you are on firmware 7+) but I believe they are all dependent on external binaries to function and that won't work in GWS.
Is it possible to format XML code in scintilla in the same way that Visual Studio does when you paste some xml into a xml file.
At the moment, the xml that I retrieve is on one line and therefore hard to work with, it would be great if the xml could be formatted properly on load.
Any suggestions?
You could use NotePad++ or http://xmltoolbox.appspot.com/
One way to do this is to get .NET to do interpret the XML and export it as a formatted string, using XMLTextWriter - see this question. This assumes you're using Scintilla inside .NET though - you'd have to use the XML features of your platform of choice if not.
I have two options in front of me for parsing really fat XML file,
TouchXML
GDataXML
It's lot of work to do because XML file is very huge. I thought of asking people who have already worked with these parsers.
Which one is better for fat XML files?
I found a blog post which says that TouchXML does not edit/save XML files whereas GDataXML has that feature. What exactly do they mean by edit/save XML file feature?
Lets see if I can answer your questions:
Which one is better for fat XML files? The answer is neither. Both are DOM parsers, which actually load the entire document into memory to make queries faster. If you're parsing a large file, you're better off going with a SAX parser, such as the built-in NSXMLParser, or even the SAX-based version of libxml2.
What exactly do they mean by edit/save XML file feature? Well, suppose you have a XML file that has your app's settings in it. If you open up that file and make changes, you're going to want to save them, right? That's where the writing comes in. The parsers that allow writing let you save the representation of the xml file in the memory into an actual file that can be written to disk.
I want to be able to generate a highly graphical (with lots of text content as well) PDF file from data that I might have in a database or xml or any other structured form.
Currently our graphic designer creates these PDF files in Photoshop manually after getting the content as a MS Word Document. But usually, there are more than 20 revisions of the content; small changes here and there, spelling corrections, etc.
The 2 disadvantages are:
1) The graphic designer's time is unnecessarily occupied. The first version is the only one he/she should have to work on.
2) The PDF file becomes the document which now has the final revised content, and the initial content is out of sync with it. So if the initial content needs to be somewhere else (like on a website), we need to recreate it from the PDF file.
Generating the PDF file will help me solve both these problems. Perhaps some way in which the graphic designer creates a "Template" and then puts in tags/holders and maps these tags/holders to the relevant data.
Thanks :-)
There are some tools out there for doing this. XSL-FO is useful. Here is a tutorial for creating a pdf from xml (or xhtml) with cocoon. Also see Apache FOP.
You could format your SQL data as XML and still use the same templates this way.
I use the ReportLab python library for this. It could perhaps solve your problem, but you will need to do some work...
In the past I have written scripts that spit out LaTeX then used texi2pdf to solve this kind of problem.
Take a look at iReport and JasperReports at http://jasperforge.org.
iReport lets you design reports, and then you can either programatically fill it with the JasperReports library (Java), or just use iReport to manually create the report.
I have only used it for tabular data, but I don't think there would be any problem for other types of documents.
You could create a form and populate the entries programmatically using a pdf library like iText (Java).
You could look at doing the workflow in PostScript which is plain text that you can easily compose from fragments. Then you can use any free tool to convert to PDF.
Take a look at Prince XML. This tool allows to generate PDF based on XML or HTML and CSS.
A possible way is to use a template engine, like FreeMarker or StringTemplate: these are often used to generate HTML, but they are flexible enough to output any format, actually.
The problem is to make a PDF template, I suppose. Perhaps you can take a sample output and edit it to replace data with placeholders to be filled by the template engine. Might not be trivial!
Sounds like a job that SQL Server Reporting Services can handle quite easily.
Reporting Services allows you to query the data, define the layout, and export to PDF without any intervention. The PDF output can be distributed via email, stored on a file share, and accessed via a page on the report server.
It can handle XML data sources too.
Another approach to generating a PDF file from data is to use prawn, which is based on ruby. I was very pleasantly surprised by how much functionality is included in prawn. It may take some investment up front but this approach will give you a lot of flexibility.
You can combine CSStoXSLFO with XEP from RenderX for high quality output. With this solution you can merge XML data into an XHTML template, which is decorated with CSS. It can also generate charts with the fantastic JFreeChart library. CSS3 page media features are supported.