How to generate Pdf invoice using Apache PDFBox - pdfbox

I have a requirement in my project to generate an invoice with the help of Apache pdfbox api.As of now I could insert images,text in the generated pdf but finding difficulty in generating tables.I couldn't find even a single example template.If anybody has please provide a link.
Note:I don't have to use iText
Thanx in Advance

This question can be a duplicate to How to create Table using Apache PDFBox. But, I found two solutions which are build on Apache PDFBox.
easytable, I used this to create tables, it has some good features like table extends into new page when reaches the bottom of current page. But I found difficulty in adding both text & image into same cell, multi styles texts into same cell. If any of these are your requirements better check for alternatives.
boxable, Not much familiar with this, but heard that it has some nice features like converting csv data into table directly.

Related

Print multiple pages using PDFBox

I have a list of text data containing links (PDActions) that might need to be rendered on more than one page. (see below)
**Table of Contents**
document1 link 5
document2 link 8
document3 link 11
Is there a simple way to just print all these content and let PDFBox decide to wrap the text and fit them in multiple pages as needed. And just give me the final PDDocument?
There are multiple answers on this topic such as this one. However, the answers are quite old, and I'm checking if there is a newer and simpler way to do it.
PDFBox version: 2.0.26
PDFBox essentially still only has that very low-level text drawing API but there are projects built on top of PDFBox offering automatic layout.
Allow me to quote the PDFBox FAQs
Can I use PDFBox to create complex layouts?
I'd like to use PDFBox to create a complex layout containing several paragraphs, tables, images etc. Is PDFBox fit for that purpose?
PDFBox being a low level PDF library provides the APIs to create page content such as text, images etc. But at this point in time it doesn't provide a higher level API to do page layout, paragraph handling, automatic line wrapping or create tables and such.
But PDFBox is the foundation of some projects which might help in that case. This includes projects such as
Boxable
BoxTable
easytable
pdfbox-layout
PdfLayoutManager
ph-pdf-layout
You may also want to consider using Apache FOP which allows to create complex documents from XML data and templates-

Getting wrong page numbers in TOC via docx4j-export-fo

I'm using docxj4 for generating Word documents and now I need to generate a table of contents. Since 3.3.0 version docx4j uses plutext conversion service to get page numbers that is inappropriate for me, so I need to use docx4j-export-fo library for that purpose. But it produces the wrong numbering... Seems like it gets the wrong page size or something like this, because all page numbers are lag 1-2 numbers.
I've researched the source code and properties docx4j provides, but for now I didn't succeed.
As per the documentation, the standalone PDF Converter (which you can download from https://converter-eval.plutext.com/ ) exists precisely to provide better accuracy than can be expected from docx4j-export-fo.
export-fo uses XSL FO to layout the document, and because the XSL FO layout model is not a precise match for Word's, there are limits to what can be achieved.
That said, improvements may be possible in individual cases. You'd need to share your docx somewhere for specific feedback.

Suggestions on extracting text from uploaded documents

I currently have a number of documents uploaded to my website on a daily basis (.doc, .docx, .odt, pdf) and these docs are stored in a sql database (mediumblob).
Currently I open the docs from the database and cut and paste a text version into a field in the database for a quick reference and search function.
I'm looking to automate this "cut & paste" process - formatting isn't a real concern just as long as I can extract the text - and was hoping that some people may be able to suggest a good route to go down?
I've tried manipulating the content of the blob field using regex but it is not really working.
I've been looking at Apache POI with a view to extracting the text at the point of upload but I can't help thinking that this maybe a bit of an overkill given my relatively simple needs.
Given the various document formats I encounter and the current storing of the content in a blob field would Apache POI be the best solution to use in this instance or can anybody suggest an alternative?
Help and suggestions greatly appreciated.
Chris
Apache POI will only work for the Microsoft Office formats (.xls, .docx, .msg etc). For these formats, it provides classes for working with the files (always read, for many write support too), as well as text extractors.
For a general text extraction framework, you should look at Apache Tika. Tika uses POI internally to handle the Microsoft formats, and uses a number of other libraries to handle different formats. Tika will, for example, handle both PDF and ODF/ODT, which are the other two file formats you mentioned in the question.
There are some quick start tutorials and examples on the Apache Tika website, I'd suggest you have a look through. It's very quick to get started with, and you should be able to easily change your code to send the document through Tika during upload to get a plain text version, or event XHTML if that's more helpful to you.

writing text to a pdf file

I have several pdf files (about 20) and very month or so I need to change spme fields with new data. This is a very time consuming task and would like to know if there is an easy way via some sort of application where users can change the name of the variables that have to be stored into the different pdf files. This would be an enormous time saver. thanks for any help.
there are lots of solutions for this.. if you are willing to write some code things can get really interesting.
a simple solution would be to create a template pdf file with placeholder fields (like #{name}, #{age} etc.,), when a new pdf needs to be created using new values you can simple use itest to edit the pdf & replace the placeholders with actual values.
you can also use jasperreports for this but it would be an overkill for just 20 odd documents.
if you are interested in a sample program i'd be happy to provide you one.
If you have form fields in the PDF file then you may use Aspose.Pdf (.NET or Java version) to fill data into those fields programmatically. You can either fill the fields using individual values or import the data from the XML/FDF/XFDF files etc. You can take a template PDF and save the output PDF files with different values. Please see if this might help in your scenario.
Disclosure: I work as developer evangelist at Aspose.

Generate PDF from structured data

I want to be able to generate a highly graphical (with lots of text content as well) PDF file from data that I might have in a database or xml or any other structured form.
Currently our graphic designer creates these PDF files in Photoshop manually after getting the content as a MS Word Document. But usually, there are more than 20 revisions of the content; small changes here and there, spelling corrections, etc.
The 2 disadvantages are:
1) The graphic designer's time is unnecessarily occupied. The first version is the only one he/she should have to work on.
2) The PDF file becomes the document which now has the final revised content, and the initial content is out of sync with it. So if the initial content needs to be somewhere else (like on a website), we need to recreate it from the PDF file.
Generating the PDF file will help me solve both these problems. Perhaps some way in which the graphic designer creates a "Template" and then puts in tags/holders and maps these tags/holders to the relevant data.
Thanks :-)
There are some tools out there for doing this. XSL-FO is useful. Here is a tutorial for creating a pdf from xml (or xhtml) with cocoon. Also see Apache FOP.
You could format your SQL data as XML and still use the same templates this way.
I use the ReportLab python library for this. It could perhaps solve your problem, but you will need to do some work...
In the past I have written scripts that spit out LaTeX then used texi2pdf to solve this kind of problem.
Take a look at iReport and JasperReports at http://jasperforge.org.
iReport lets you design reports, and then you can either programatically fill it with the JasperReports library (Java), or just use iReport to manually create the report.
I have only used it for tabular data, but I don't think there would be any problem for other types of documents.
You could create a form and populate the entries programmatically using a pdf library like iText (Java).
You could look at doing the workflow in PostScript which is plain text that you can easily compose from fragments. Then you can use any free tool to convert to PDF.
Take a look at Prince XML. This tool allows to generate PDF based on XML or HTML and CSS.
A possible way is to use a template engine, like FreeMarker or StringTemplate: these are often used to generate HTML, but they are flexible enough to output any format, actually.
The problem is to make a PDF template, I suppose. Perhaps you can take a sample output and edit it to replace data with placeholders to be filled by the template engine. Might not be trivial!
Sounds like a job that SQL Server Reporting Services can handle quite easily.
Reporting Services allows you to query the data, define the layout, and export to PDF without any intervention. The PDF output can be distributed via email, stored on a file share, and accessed via a page on the report server.
It can handle XML data sources too.
Another approach to generating a PDF file from data is to use prawn, which is based on ruby. I was very pleasantly surprised by how much functionality is included in prawn. It may take some investment up front but this approach will give you a lot of flexibility.
You can combine CSStoXSLFO with XEP from RenderX for high quality output. With this solution you can merge XML data into an XHTML template, which is decorated with CSS. It can also generate charts with the fantastic JFreeChart library. CSS3 page media features are supported.