Convert Informix file into pdf - pdf

Is it possible to convert file into PDF directly from Informix?
Is there a command for it? If its not possible, then what should I do?
For example: I want to convert file sm2026.4gl into PDF.

I'm not sure if I understand the problem correctly, it looks like you're trying to get a PDF version of a 4GL program, which doesn't make any sense. There are any number of free websites that can do this for you.
If, however, you're asking how to get a 4GL report to generate a PDF, that's a considerably more interesting problem. Informix-4GL will not write a PDF file natively. If I remember correctly, 4Js Genero will, and Querix Lycia might.
However, on Linux, there is a PDF printer driver (cups-pdf) that will output the report to a PDF file.
Implementation of this is left as an exercise to the reader. :-)

There are a number of options available to you. What you choose will depend on how much time and money you are prepared to invest. The more you are prepared to invest, the better reports you will get
Do a google search for scripts/executables that will take a text file and convert to PDF. Examples include txt2pdf. These work on any text files and so are independent of the 4gl. You would amend your 4gl code to execute this via RUN immediately after your FINISH REPORT
Write 4gl libraries to create valid PDF output. That involves you reading the PDF manuals to see the structure required in a PDF file. The first line of the resulting file will begin "%PDF". This is a lot of work, I did it 15-20 years ago, I wouldn't do it again unless you want the control and independence this gives you.
Using a product such as FourJs Genero that will allow you to use your existing 4gl code to create PDF reports directly. At its simplest this involves adding a couple of lines prior to your START REPORT and leaving the REPORT statement intact. The report will use a monospaced font and look like your existing report, only it is a PDF instead of a TXT file.
IF fgl_report_loadCurrentSettings(NULL) THEN -- simple compatibility mode
CALL fgl_report_selectDevice("PDF") -- indicates PDF
... -- optional calls to indicate filename, paper, printer and other options if required
LET grw = fgl_report_commitCurrentSettings()
START REPORT report-name TO XML HANDLER grw
Using this option there are a few extra config options that are available such as adding watermarks/logos to every page that the free tools you find might not provide
The more feature rich option using FourJs Genero Report Writer involves stripping out any layout information from your 4gl code and designing the layout of the report in a WYSIWYG designer. Your 4gl code that gathers the data from the database and the functions that formulate the output is untouched. The REPORT statement no longer requires the layout information such as COLUMN 10, SKIP TO TOP OF PAGE, and that can be removed. The WYSIWYG design of the report controls the layout including full range of properties such as font, font attributes, positioning, page breaks, page numbering, images. So your 4gl code becomes
-- Report
-- No layout information in report, only need to gather and formulate data
REPORT report-name ...
BEFORE GROUP OF invoice
PRINT invoice.*
ON EVERY ROW
PRINT invoice_line.*
AFTER GROUP OF invoice
LET invoice_total.net = GROUP SUM(...)
PRINT invoice_total.*
END REPORT
...
-- Produce report
IF fgl_report_loadCurrentSettings("reportdesign.4rp") THEN -- load the WYSIWYG design
CALL fgl_report_selectDevice("PDF") -- indicates PDF
... -- calls to indicate filename, paper, printer and other options if required
LET grw = fgl_report_commitCurrentSettings()
START REPORT report-name TO XML HANDLER grw
As shown there are a number of options available to you. As to what you should do, that depends on what your goal is, and how much time and money you are prepared to invest to achieve that goal.

Related

Tables or images too wide in Pandoc output as DOCX or PDF/LaTeX

I am writing a quick and dirty report using pandoc and markdown.
I need to generate a PDF or a DOCX with minimum hassle, I don't care much about which (best would be both, of course). Also, I am somewhat constrained regarding the figures and tables -- they have been generated a priori with another program and I would rather be able to insert them as they are then to convert them to suit pandoc's needs.
However, the main constraint is that I don't want to edit the resulting document manually, be that LaTeX or DOCX. I want to do all editing in markdown.
Here is the problem:
In DOCX, the tables are displayed fine: they have the width of the document. However, the figures are much too wide. I can either convert the images to a lower resolution (which doesn't look nice), or manually resize the images in Word (which is out of question).
In PDF, the generated figures are fine (more or less), however another two problems appear:
The tables are too wide, because there are no line breaks, and
LaTeX being LaTeX, the order of figures and tables are "reorganized", that is, they are not consecutive.
Thus, none of the documents generated are usable for my purposes.
All I wanted to do is to slap together some results and generate a file that I can send to another scientist.
Question: what is the best solution to generate a quick and dirty report in pandoc with minimum effort and at least all results visible?
Update: Upgrading pandoc to 1.4 or later solves the issue -- the figures have now correct sizes in docx documents.
Control over image size
Currently you cannot control that feature directly from Markdown. For LaTeX/PDF output, this is automatically handled by LaTeX/pdflatex itself.
In recent months there have been some discussions going on in the Pandoc developer + user community about how to best implement it and create an easy-to-use syntax, for example
![Image Caption](./path/to/image.jpg "Image Comment"){width="60%", height="150px"}
(Warning: Example only, made up on the spot + extracted from thin air by myself -- can't remember the latest state of the discussion...) This is designed to then transfer to all the supported output formats which can contain images, not just to LaTeX/PDF.
So something along these lines is planned to be a major new feature for the next major release of Pandoc, and will start to be working better in ODT/DOCX output as well.
Control over table/cell widths and line breaks within cells
How exactly do you specify your tables in Markdown syntax?
Are you aware that Pandoc supports several variations like gid_tables, pipe_tables, simple_tables and multiline_tables?
You should look into using pandoc --from=markdown+multiline_tables ... as your command and write the critical tables as multiline_tables in your Markdown.
Read all about the details via man pandoc_markdown...
Multiline tables give you a limited control over the width of individual columns in the output, just by widening or narrowing the column widths in the markdown source itself.
Order of figures and tables when outputting LaTeX/PDF
Pandoc supports the insertion of raw_tex lines and environments into the Markdown source file. When it encounters such lines, it transmits them un-changed into its LaTeX output. (But it will be ignored for all other outputs.)
So you can insert lines like
\newpage{}
into the Markdown to enforce a page break. This already gives you some limited control over keeping the order of mis-behaving figures or tables. (After all, you said you look for a "quick and dirty" method, not a sophisticated typeset document...)
Of course, if you know LaTeX more and better, you can also use stuff like
/FloatBarrier inside your Markdown.
Going down that road (mixing LaTeX code into Markdown) gives you a few disadvantages:
The Markdown will not look as pretty any more.
The Markdown will not work fully with other output formats (should you need them).
But the advantage still are:
You will be writing and modifying the document text much faster in Markdown than authoring it in LaTeX.
You have some additional control over the final look of your PDF:
order of tables + figures
look + width of tables + figures (because, you can of course insert a complete LaTeX 'figure' or 'table' environment).

Rule based PDF text extraction for verious bills and invoices

I have to extract text from invoices and bills pdf files
The files layouts can get complex, though its mostly filled with tables.
I've read a few dozens articles already about the pdf format, how easy it is for our brain to grasp it and how hard it is for a machine to understand its structure.
Also downloaded a few tools like the python's pdfminer and some java tools, some even have rule based layout extraction, like LA-PDBtext these are all great libraries, leaving you the final step.
Adobe also has an online service called exportPdf but it can't be customized
Bottom line, I understand that in order to extract text from structured pdf files and convert it to XML for example, there should be some level of manual work.
I also found From Data Extractor, a non free tool with the ability to set extraction rules that claims to do the job, though its hard to find a proper manual and it runs only on windows.
I thought I may even try a to convert those files to images and try tesseract-ocr but decided to ask for advice here before I spend more time on it.
I'll be very grateful if someone with such experience give me a hint.
I've done a lot of PDF extraction and I can confirm as you've already discovered that it can be a painful process to start. One of the important things to understand is that there is no concept of "tables" within a PDF, just text that happens to have lines around it. Also, there's no guarantee that the linear order of text within the PDF code actually matches the visual order when printed. In other words, there's no guarantee that "hello world" is written in that order, it could be draw 'word' at coord 20 then draw 'hello' at coord 10. Most PDF creators don't do this but still there's no guarantee. The more creative a PDF creator is (InDesign, Illustrator, etc) the more likely the text is going to be harder to get out. And actually, once a designer starts messing with fonts too much some programs will sometimes actually output words one character at a time, changing the font just slightly each time.
That said, I'd recommend the first one that you looked at, LA-PDFText. You can run it in discovery mode (blockify) from which you can create rules. I don't have Java installed anymore so I can't test it but it seems very promising.
Your second one, A-PDF Form Data Extractor, only really works with actual PDF forms. If this is your case I'd recommend just using an open source solution like iText/iTextSharp.
The last OCR one makes me cringe. I just can't imagine going through those hoops would get you better text representation than parsing the PDF. But then again, PDF is a visual format so maybe it would.
Personally I use iText/iTextSharp for this kind of thing but I also like to do things the hard way.
It is not clear if you are looking for the development tool to automate the data extraction from bills and invoices or just for the one time tool (utility) that can be used by the non-developer?
Anyway here are some specialized tools including engines they use:
Tabula (open-source, especially designed to extract data from tables in PDF. Can export shell scripts for batch processing, runs as the localhost web service, powered by JRuby Tabula engine)
Viet OCR (open-source .NET desktop utility for text extraction from PDF and images, based on tesseract oct engine)
Bytescout PDF Viewer (freeware closed source .NET utility, detects and extracts tables, including scanned invoices, powered by PDF Extractor SDK)
DISCLAIMER: I work for ByteScout.

writing text to a pdf file

I have several pdf files (about 20) and very month or so I need to change spme fields with new data. This is a very time consuming task and would like to know if there is an easy way via some sort of application where users can change the name of the variables that have to be stored into the different pdf files. This would be an enormous time saver. thanks for any help.
there are lots of solutions for this.. if you are willing to write some code things can get really interesting.
a simple solution would be to create a template pdf file with placeholder fields (like #{name}, #{age} etc.,), when a new pdf needs to be created using new values you can simple use itest to edit the pdf & replace the placeholders with actual values.
you can also use jasperreports for this but it would be an overkill for just 20 odd documents.
if you are interested in a sample program i'd be happy to provide you one.
If you have form fields in the PDF file then you may use Aspose.Pdf (.NET or Java version) to fill data into those fields programmatically. You can either fill the fields using individual values or import the data from the XML/FDF/XFDF files etc. You can take a template PDF and save the output PDF files with different values. Please see if this might help in your scenario.
Disclosure: I work as developer evangelist at Aspose.

Dynamic PDF features

I've been asked to write a program which generates reports in the form of PDF files. There are two main dynamic features which have been asked for, which I'm not sure are even possible:
1) The report contains a table with several columns. Users should be able to click on the column header to sort the table rows by the values in that column.
I've never seen a PDF file that users can click on to re-sort table results, but I'm told that this is possible.
2) The report should have a dropdown box which users can select to toggle which rows of the table are displayed or hidden.
I'm fairly sure that this isn't possible to do in a PDF file, though I've been told otherwise.
So my question is, which of these things are even possible, and what library should I use for generating PDF files? (The library can be in any programming language.)
Don't use PDF as a substitute for html/CSS/JavaScript/etc. PDF is best when it's used as an immutable document format, not as a poor man's web page. Sure, you can put your foot in a box and call it a shoe, but it's really just a box.
Have a look at
Sorting tables in dynamic PDF on the Adobe Developer Connection website.
You can also download a ready-to-study sample PDF with that feature built in.
I would look at Acrobat. There is a JavaScript implementation for it.
http://www.adobe.com/devnet/acrobat/javascript.html
For Java there are the following tools / libraries that are very good and stable:
JasperReports - you design your report in a graphical designer and then populate it with data programatically.
The other is iText. It works on the lower lavel (actualy JasperReports is built on top of it for the PDF part), so it might support the requested sorting options.
Yes, all of those dynamic features are possible with an XFA PDF form (created in LiveCycle Designer) and scripting ( JavaScript). We have examples of sorting rows in tables and hiding and showing sub-forms at http://www.pdfscripting.com , but you must be a member to access them (not free). You may be able to find free sample files doing an internet search for XFA PDFs or LiveCycle Designer PDFs- not sure but it is possible at any rate.
Dimitri
WindJack Solutions
http://www.windjack.com

Generate PDF from structured data

I want to be able to generate a highly graphical (with lots of text content as well) PDF file from data that I might have in a database or xml or any other structured form.
Currently our graphic designer creates these PDF files in Photoshop manually after getting the content as a MS Word Document. But usually, there are more than 20 revisions of the content; small changes here and there, spelling corrections, etc.
The 2 disadvantages are:
1) The graphic designer's time is unnecessarily occupied. The first version is the only one he/she should have to work on.
2) The PDF file becomes the document which now has the final revised content, and the initial content is out of sync with it. So if the initial content needs to be somewhere else (like on a website), we need to recreate it from the PDF file.
Generating the PDF file will help me solve both these problems. Perhaps some way in which the graphic designer creates a "Template" and then puts in tags/holders and maps these tags/holders to the relevant data.
Thanks :-)
There are some tools out there for doing this. XSL-FO is useful. Here is a tutorial for creating a pdf from xml (or xhtml) with cocoon. Also see Apache FOP.
You could format your SQL data as XML and still use the same templates this way.
I use the ReportLab python library for this. It could perhaps solve your problem, but you will need to do some work...
In the past I have written scripts that spit out LaTeX then used texi2pdf to solve this kind of problem.
Take a look at iReport and JasperReports at http://jasperforge.org.
iReport lets you design reports, and then you can either programatically fill it with the JasperReports library (Java), or just use iReport to manually create the report.
I have only used it for tabular data, but I don't think there would be any problem for other types of documents.
You could create a form and populate the entries programmatically using a pdf library like iText (Java).
You could look at doing the workflow in PostScript which is plain text that you can easily compose from fragments. Then you can use any free tool to convert to PDF.
Take a look at Prince XML. This tool allows to generate PDF based on XML or HTML and CSS.
A possible way is to use a template engine, like FreeMarker or StringTemplate: these are often used to generate HTML, but they are flexible enough to output any format, actually.
The problem is to make a PDF template, I suppose. Perhaps you can take a sample output and edit it to replace data with placeholders to be filled by the template engine. Might not be trivial!
Sounds like a job that SQL Server Reporting Services can handle quite easily.
Reporting Services allows you to query the data, define the layout, and export to PDF without any intervention. The PDF output can be distributed via email, stored on a file share, and accessed via a page on the report server.
It can handle XML data sources too.
Another approach to generating a PDF file from data is to use prawn, which is based on ruby. I was very pleasantly surprised by how much functionality is included in prawn. It may take some investment up front but this approach will give you a lot of flexibility.
You can combine CSStoXSLFO with XEP from RenderX for high quality output. With this solution you can merge XML data into an XHTML template, which is decorated with CSS. It can also generate charts with the fantastic JFreeChart library. CSS3 page media features are supported.