How do I create a PDF directly from code? - pdf

Is there an IDE (that preferably uses C), that allows me to output the results of a program onto a PDF?
I am tasked with writing a program that does a specific set of computations in which the results of each individual computation needs to be output onto a PDF file that can be opened later normally. Is there an IDE that would allow me to do such a thing? I'm currently using Codeblocks and I'm not sure whether it can do what I need as I'm not too familiar with it.

If your question is simply how to printout your code as a PDF most if not all IDEs can do that via printing.
If its how to hand write a pdf with an ID Editor you need to learn how to hand code a pdf first.
However for code::blocks see 2.14 Source Code Exporter in the PDF User Manual

Related

Convert Informix file into pdf

Is it possible to convert file into PDF directly from Informix?
Is there a command for it? If its not possible, then what should I do?
For example: I want to convert file sm2026.4gl into PDF.
I'm not sure if I understand the problem correctly, it looks like you're trying to get a PDF version of a 4GL program, which doesn't make any sense. There are any number of free websites that can do this for you.
If, however, you're asking how to get a 4GL report to generate a PDF, that's a considerably more interesting problem. Informix-4GL will not write a PDF file natively. If I remember correctly, 4Js Genero will, and Querix Lycia might.
However, on Linux, there is a PDF printer driver (cups-pdf) that will output the report to a PDF file.
Implementation of this is left as an exercise to the reader. :-)
There are a number of options available to you. What you choose will depend on how much time and money you are prepared to invest. The more you are prepared to invest, the better reports you will get
Do a google search for scripts/executables that will take a text file and convert to PDF. Examples include txt2pdf. These work on any text files and so are independent of the 4gl. You would amend your 4gl code to execute this via RUN immediately after your FINISH REPORT
Write 4gl libraries to create valid PDF output. That involves you reading the PDF manuals to see the structure required in a PDF file. The first line of the resulting file will begin "%PDF". This is a lot of work, I did it 15-20 years ago, I wouldn't do it again unless you want the control and independence this gives you.
Using a product such as FourJs Genero that will allow you to use your existing 4gl code to create PDF reports directly. At its simplest this involves adding a couple of lines prior to your START REPORT and leaving the REPORT statement intact. The report will use a monospaced font and look like your existing report, only it is a PDF instead of a TXT file.
IF fgl_report_loadCurrentSettings(NULL) THEN -- simple compatibility mode
CALL fgl_report_selectDevice("PDF") -- indicates PDF
... -- optional calls to indicate filename, paper, printer and other options if required
LET grw = fgl_report_commitCurrentSettings()
START REPORT report-name TO XML HANDLER grw
Using this option there are a few extra config options that are available such as adding watermarks/logos to every page that the free tools you find might not provide
The more feature rich option using FourJs Genero Report Writer involves stripping out any layout information from your 4gl code and designing the layout of the report in a WYSIWYG designer. Your 4gl code that gathers the data from the database and the functions that formulate the output is untouched. The REPORT statement no longer requires the layout information such as COLUMN 10, SKIP TO TOP OF PAGE, and that can be removed. The WYSIWYG design of the report controls the layout including full range of properties such as font, font attributes, positioning, page breaks, page numbering, images. So your 4gl code becomes
-- Report
-- No layout information in report, only need to gather and formulate data
REPORT report-name ...
BEFORE GROUP OF invoice
PRINT invoice.*
ON EVERY ROW
PRINT invoice_line.*
AFTER GROUP OF invoice
LET invoice_total.net = GROUP SUM(...)
PRINT invoice_total.*
END REPORT
...
-- Produce report
IF fgl_report_loadCurrentSettings("reportdesign.4rp") THEN -- load the WYSIWYG design
CALL fgl_report_selectDevice("PDF") -- indicates PDF
... -- calls to indicate filename, paper, printer and other options if required
LET grw = fgl_report_commitCurrentSettings()
START REPORT report-name TO XML HANDLER grw
As shown there are a number of options available to you. As to what you should do, that depends on what your goal is, and how much time and money you are prepared to invest to achieve that goal.

Replace words/phrases in existing PDF or docx with other words

I am trying to make a dynamic PDF generator as an .NET Core API. I want to take an existing PDF, or .docx file, and edit it so it replaces the current name (John Doe) with something that can be replaced like #NAME_PLACEHOLDER.
I then want to transform #NAME_PLACEHOLDER -> John Doe (or whatever is in the KeyValuePair or Dictionary<string, string>).
I am running this on a Docker environment, so I can easily execute commands and I am willing to do that as well.
So far I have tried a few things:
1) pdf2htmlEX
Executes as pdf2htmlEX file.pdf
Does the job pretty well
Can be converted back to PDF using Google Chrome headless or similar
Problem: Only the characters used in the PDF can be used to replace. So if I only use A, B, C as characters, it will make D into Times New Roman (or default font)
2) LibreOffice ODT to PDF
This was pretty nice, because I could simply unzip the .odt file, open content.xml, search and replace, then save it as an .odt file again
Could be converted into PDF rather easily using soffice --convert-to pdf
LibreOffice is quite nice
Problem 1: Microsoft Word -> Save as ODT tends to break the formatting, so we have to use LibreOffice to go and change it back again
Problem 2: We don't want to move away from Microsoft's Office suite
3) HTML to PDF using Chrome Headless
What you see is what you get
By far the best option, if we're all developers aaand have unlimited time
Problem 1: Only our developers can make changes, since our marketing department do not know HTML
Problem 2: Our existing PDFs would have to be rewritten in HTML
As you can see, I have tried a bunch of things. None of them, except Chrome Headless, has lived up to my expectations. What I really like about #3 is what you see is what you get. I can make the whole thing in HTML, press CTRL+P and see what it looks like as a finished PDF, basically.
I am looking for a better solution, though. It can be paid. It can be free. All I need is to change out words/phrases with other words dynamically, which apparently seems like a tough thing to do.
Thanks for specifying what you've already found clearly. It helps a lot providing a succinct answer.
The conversion is always tricky - I'm sure you know Word has trouble displaying/editing some Word documents itself.
I have experience regarding point #2 "LibreOffice ODT to PDF" and can suggest a few things to test:
Don't use Microsoft to do the docx->odt conversion. It's not good as you know. Use LibreOffice itself to do this step. The rest of your process remains the same.
For some documents, Libre Office does doc->odt much better. So, you can instead work with DOC format and get a better result without any other changes.
You won't be able to remove the devs from the process, but you can certainly reduce their role allowing your business/marketing teams to have more direct input simply by:
get the starting point document to the devs to run through the conversion process. The devs can "clean up" the document to make it convert nicely.
make this version of the document the "official" starting point. The business or technical teams can load it, adjust it, and put it back into the process.
if possible, expose a test-platform to the business teams so they can download, adjust, upload and render to PDF. This cycle means they will be able to achieve more and if they're good, do impressive stuff without any dev input.
the above steps simply mean don't expect perfect conversion of arbitrary complex documents. Starting from a (even complex) working baseline is great.
Some of that might show you that your #2 is actually going to get the best overall results.
I hope that helps.

Get selected "PostScript" from PDF

I wasn't able to find anything on the internet and I get the feeling that what I want is not such a trivial thing. To make a long story short: I'd like to get my hands on the underlying code that describes the PDF document of a selected area from a .pdf file. I've been looking for libraries or open source readers but couldn't find anything useful yet.
Does there exist something that might be able to accomplish my needs here or anything that might be reused (like an open source reader) to get there a little faster and not having to write everything from scratch?
You can convert a whole PDF document to PostScript using pdftops, one of the utilities from the poppler PDF rendering library.
This utility enables you to convert individual pages, which is at least a start.
If you just want to extract bitmapped images, try pdfimages from the same package. This extraction can also be restricted to individual pages.
The poppler library was originally written for UNIX-like systems, but there are a couple of windows builds available.
The open source tool from iText called iText RUPS does what you want, showing you all the PDF commands for a particular PDF and allow you to visualize the structure and relationships.
http://sourceforge.net/projects/itextrups/

Create documents without dependency on Office programs?

Is there any way for my app to create a document that will need saved and printed without utilizing some external software? Currently I have a spreadsheet that does a bunch of calculations then a button that runs some VBA to export 2 sheets to a PDF, then it saves and prints it. I want to port this spreadsheet into a .NET app.
I have experience with everything EXCEPT this: how do I re-create these 2 documents that are currently in Excel, without having to utilize Excel? My whole goal here is to get out of Excel...because I hate it, and I'd like to send this app to clients across the country. I really don't want to have to tell clients that they need Excel to use this. I might as well just leave it in the spreadsheet if that's the case.
I'm sorry if this is sort of "nooby", but I'm not sure what the best course of action here is. Should I try to mimic my Excel sheets on 2 hidden forms and save/print those? Should I write some HTML to produce these forms in a browser and save/print from there? Are there any other options here? I'll probably end up saving as XPS if I can find a way to get out of Excel. Would love some pointers if you have any ideas. Thanks everyone!
Edit: a little more info...I don't need help with the calculations, or exporting PDF's. I don't need any help regarding Excel or VBA. The workbook has 1 input sheet where users enter data. The results of the calculations appear on 2 other sheets in this workbook. These 2 sheets are currently exported to 1 PDF using VBA, which is then saved and printed using VBA. These spreadsheets are not "spreadsheets" like you may be thinking. They are actually "forms", for lack of a better term, that the user will never edit after running the macro to export them as PDF's. They contain text, pictures, shapes, etc. Excel is merely the medium currently being used to create these documents. My goal is to build this project in .NET and get us out of Excel, but I'm not quite sure how to reproduce these 2 forms within my app without utilizing Office. Think of them as templates. After the user enters data in my app, it will do some calculations, and the results need to appear on 2 forms that need printed and saved on the user's machine. How do I recreate these 2 forms in .NET?
Of ya, vb.net, winforms (although I could use WPF as I haven't started yet), 4.0 framework :)
Here is what you can do:
Add an empty Excel file into your resources and use it as template
When your program is to save data, you can take that file from resources and save it to hard disk.
Connect to Excel file using "Microsoft.Ace.oleDb."
Save and read data in Excel just as it was a db table - there are plenty examples on the net. Google for it.
For this project you don't need Excel application on the machine.
Now, if your concern is Excel, you don't have to worry. Your clients can use OpenOffice, for example. Or, you can save data in CSV format. CSV is not Excel. You can create your own text format and your clients will be able to read it with the Notepad. You can do HTML/XML combination and have your html page load whatever xml you supply.
Seriously, create a spreadsheet and tell your clients that they can open it with their favorite spreadsheet editor.
So instead of using a third party program or anything fancy , you could just read in the excel or xsl file and spit it out? Just write some code to format the data properly for users... There is a similar question that may help you with a tutorial - Here But this is for java, Are you using c# or vb ? .Net 4.5 ? razor ?
You can create the PDF's from code using tools like iTextSharp. Or use fillable PDF templates and use iTextsharp to fill in the form. You will need a program to print the PDF, I use the Foxit Reader. I'm sure there are some full featured PDF tools out there that include printing.
You can also doing printing from VB.Net but I would guess mixed media documents would be difficult.

Rule based PDF text extraction for verious bills and invoices

I have to extract text from invoices and bills pdf files
The files layouts can get complex, though its mostly filled with tables.
I've read a few dozens articles already about the pdf format, how easy it is for our brain to grasp it and how hard it is for a machine to understand its structure.
Also downloaded a few tools like the python's pdfminer and some java tools, some even have rule based layout extraction, like LA-PDBtext these are all great libraries, leaving you the final step.
Adobe also has an online service called exportPdf but it can't be customized
Bottom line, I understand that in order to extract text from structured pdf files and convert it to XML for example, there should be some level of manual work.
I also found From Data Extractor, a non free tool with the ability to set extraction rules that claims to do the job, though its hard to find a proper manual and it runs only on windows.
I thought I may even try a to convert those files to images and try tesseract-ocr but decided to ask for advice here before I spend more time on it.
I'll be very grateful if someone with such experience give me a hint.
I've done a lot of PDF extraction and I can confirm as you've already discovered that it can be a painful process to start. One of the important things to understand is that there is no concept of "tables" within a PDF, just text that happens to have lines around it. Also, there's no guarantee that the linear order of text within the PDF code actually matches the visual order when printed. In other words, there's no guarantee that "hello world" is written in that order, it could be draw 'word' at coord 20 then draw 'hello' at coord 10. Most PDF creators don't do this but still there's no guarantee. The more creative a PDF creator is (InDesign, Illustrator, etc) the more likely the text is going to be harder to get out. And actually, once a designer starts messing with fonts too much some programs will sometimes actually output words one character at a time, changing the font just slightly each time.
That said, I'd recommend the first one that you looked at, LA-PDFText. You can run it in discovery mode (blockify) from which you can create rules. I don't have Java installed anymore so I can't test it but it seems very promising.
Your second one, A-PDF Form Data Extractor, only really works with actual PDF forms. If this is your case I'd recommend just using an open source solution like iText/iTextSharp.
The last OCR one makes me cringe. I just can't imagine going through those hoops would get you better text representation than parsing the PDF. But then again, PDF is a visual format so maybe it would.
Personally I use iText/iTextSharp for this kind of thing but I also like to do things the hard way.
It is not clear if you are looking for the development tool to automate the data extraction from bills and invoices or just for the one time tool (utility) that can be used by the non-developer?
Anyway here are some specialized tools including engines they use:
Tabula (open-source, especially designed to extract data from tables in PDF. Can export shell scripts for batch processing, runs as the localhost web service, powered by JRuby Tabula engine)
Viet OCR (open-source .NET desktop utility for text extraction from PDF and images, based on tesseract oct engine)
Bytescout PDF Viewer (freeware closed source .NET utility, detects and extracts tables, including scanned invoices, powered by PDF Extractor SDK)
DISCLAIMER: I work for ByteScout.