itextsharp: solution on how to display a report - vb.net

i have a report which looks like this. it will be in PDF format:
alt text http://img52.imageshack.us/img52/3324/fullscreencapture121420.png
the user will input all the different foods, thus every section like NONE, MODERATE, SEVERE will be a different size and thus i need to be able to expand the sections during run time. in order to do that i should probably slice up the image and add different sections during run time. i dont know the proper way to do it.
please help me with a suggestion on how to go about fitting the text in the appropriate sections (but also keep in mind i have no control over how many foods are in each section, the user will decide this during run time)

I would create an iTextSharp table for each of your results (None, Moderate, Severe) and write out the table sequentially, in the order you want them to appear on your PDF. Each row in your tables would have four columns.
I found these articles useful for creating tables in iTextSharp:
iTextSharp - Introducing Tables
SourceForge Table Tutorial
Edit
Sorry, I didn't see the vb.net tag on your question. The pages I linked are in C# - I hope you can translate. I found that most of the iTextSharp samples you'll find are in C#.

It might be worth using a reporting tool rather than iTextSharp for formatted/tabular data?
We use Active Reports from http://www.datadynamics.com/ but I am sure there are others.
EDIT:
It looks like iTextSharp supports html-to-pdf conversion? Maybe thats easier to render?
Just did a search and found this: http://somewebguy.wordpress.com/2009/05/08/itextsharp-simplify-your-html-to-pdf-creation/

Related

Power Automate: Is there an operation that can split PDFs based on shared text across pages?

Any advice on this would be appreciated! I'm a newbie to Power Automate and Flows, though have watched a lot of tutorial content. I haven't seen a guide for exactly what I'm looking to do, so was hoping an experienced user could provide some advice.
What I need to do is split a PDF into smaller PDFs grouped by entity ID numbers that are on each page. I can't go an split on an increment because some entities have more pages of data that others. Generally the PDF will be about 700 pages and will be split into about 300 PDFs grouped by entity. Currently this is a labor intensive process, and automating would be incredible.
I'm looking into doing it with an Encodian split PDF by text action, but that requires the text is provided. What I need is a way to identify which pages have the same ID and group those into PDFs.
Does anyone have any experience doing something similar?
I have tried putting this together, but so far have only found operations that will let me split when I find a specific text string that must be provided during the operation. What I need is a way to find the entity IDs on each page, and then group the pages for the each entity together and split into its own smaller PDF file.

UiPath pdf table scraping into a DataTable type

Can I somehow import a table from pdf to a UiPath DataTable?
I can do it via loading it to a string array and after that splitting it. But I hope so there is a better and more safe solution to get a table from PDF.
The easiest way is to use the Read PDF Text activity.
As it's often changing, it does really make sense to give you a tutorial here about how you add it to UiPath and what parameters you currently have. Overall all info you can find on the official detailed tutorial.
Basically this activity gives you the easy way of extracting data and managing it.
If the result is not okay you will need to switch to the OCR activity with nearly the same name. This one reads the data visually. But from what you gave here, I would not recommend you that.
There are many other activities out there, like extracting it into an Excel Workbook. So simply try out what you need.

How do I extract tables from a historical PDF?

I need to extract data from similarly formatted tables from this file. There are some OCR errors but I have an automated method to correct them.
I have tried:
ABBYY Finereader table detection.
Tabula table extraction
Camelot table extraction
Custom python code
The Problem: The commercials tools are very bad with detecting the edges of the table. The tables follow a similar general format but each scan is aligned slightly differently so hard coding the boarders won't work either.
Question: Do you guys know a good way to detect where the table begins and then apply one of a few templates?
Any other tips for this kind of work are greatly appreciated.
UPDATE 2/26:
I solved my own question, though feel free to respond with fast or better solutions.
One of the main problems is that the tables are roughly similar in their dimensions but they vary from page to page. The scanned images are also slightly offset from page to page, giving two alignment problems. My current workflow solves both and is as follows.
Table Type Alignment
Solution:
Use the image editing tools in ABBYY to cut each page horizontally. This gives one table on each page.
Note that there are 4 table types. Even pages and odd pages have separate layouts. The first table on each page includes a field for date.
That gives first-table-even, first-table-odd, reg-table-even, reg-table-odd. Processing one type at a time with fixed table areas and columns fixes misalignment due to differences in the tables layouts.
Image Alignment
The images of the same table type are still not aligned so specifying a table layout in (x,y) coordinates won't work. The tables locations are in different in each image.
I needed to align the images based on the table location, but without already detecting the table there was no good way to do that.
I solved the problem in an interesting way, but I tried the following steps first.
Detect vertical lines using Opencv. Result: did not detect faint lines well. Would often miss lines making it useless for alignment.
Use Scan Tailor to detect content. Result: The detection algorithm would crop some tables too much in some files and in others include white space because of specks in the image. Despeckling didn't help.
Use Camelot with wide table areas, no column values. Result: This would probably work well in other cases but Camelot fell down here. The data is reported to down to cents and there are spaces between every three digits. This resulted in the misplacement of the 00 in several columns.
Solution:
After having cut images into tables explained in Table Type Alignment section, use the Auto align layers feature in Photoshop to align the images.
Step-by-Step Solution:
Open Photoshop
Load images of one table type into a single file using: File-Scripts-Load Files into Stack
Use: Edit-Auto-align layers
Use crop tool to make each file the same size.
Export each image as its own file: File-Export-Layers to files
Use ABBYY OCR editor on each of the 4 table types, hardcode the columns and rows using GUI editor.
Export to CSV from ABBYY
Use something like clean.py to remove spaces and bad chars.
Done! Combine the files for each table however you like. I will post my python code for doing this when I'm done with the project. Once cleaned, I will post the data too.
There is a free online tool here https://www.pdftron.com/pdf-tools/pdf-table-extraction/
The related blog https://www.pdftron.com/blog/parsing-extraction/table-extraction-and-pdf-to-xml-with-pdfgenie/ references PDFGenie command line tool
Instead of Camelot table_areas parameter (which specifies fixed boundaries), you can try to use table_regions parameter to specify the regions where the tables probably are (Camelot will only analyze the specified regions to look for tables).
https://camelot-py.readthedocs.io/en/master/user/advanced.html#specify-table-regions
Please keep us updated.

Setting text to be read column-wise by screen-reader in iText 7

I have a page in my PDF that consists of several columns. I would like the screen-reader to read each column individually before moving on to the next column. Currently it just reads the text that appears from left to right. Is there any way to do this in iText 7?
The answer depends on whether you create this document by yourself with iText or you want to fix this issue in already existing PDF document.
In the first case you simply need to specify that you want to create document logical structure along with document content. In order to achieve this, you need to call PdfDocument#setTagged() method upon creation of PdfDocument instance. Document logical structure is something that tools like screen readers would rely on in order to get the correct logical order of the contents.
In the second scenario, when you already have a document with several columns, however it's reading order is messed up, it is most likely that this document doesn't have proper logical structure in it (or in other words it is not tagged properly). The task of fixing the issue you described in already existing PDF document (this task is sometimes called structure recognition) is extremely difficult in general case and cannot be performed automatically as of nowadays. There are several tools that would allow you to fix such documents manually or semi-automatically (like Adobe Acrobat) but iText 7 doesn't provide structure recognition functionality right now.

Is there a way to automatically import data into a form field in Adobe Acrobat Pro?

I'm open to other solutions as well.
My issue is this. We have about 500+ and growing different PDFs that need to have certain information (company info, phone numbers, etc.) added to form fields dynamically. The reason this needs to be dynamic is that this information changes regularly and we do not want to have to update all 500 PDFs each time it changes. So I am looking for some way to set up the PDFs so that they all read from a single external source (could be something as simple as a text file) dynamically upon opening the PDF in Acrobat Pro.
I have done some on-the-fly PDF creation in the past through PHP, however this does not seem like the best solution here as the PDFs need to be edited a lot by non-programmers and such. I'd prefer not to go that route and just stick to finding a way to get a few lines of data into the PDFs they create.
I've researched this a bit and it seems... possible, but confusing? This is the best thing I could find so far:
http://www.pdfscripting.com/public/department48.cfm
But the three solutions that it offers near the bottom all sound convoluted. Just wondering if there is something simpler that I am missing. All I really need to do is have the PDF import a few small chunks of text. Seems like it should be easy...
I think you can give http://www.codeproject.com/Tips/679606/Filling-PDF-Form-using-iText-PDF-Library a try. Hopefully it fulfills your needs.