How to insert values into an existing PDF on the fly? - asp.net-mvc-4

There is a PDF with some fields to accept values from the user(for example: a "bio data" form). My question is that how can I insert the user inputs to the Correct fields of the existing PDF and to generate the filled PDF?
if i using iTextSharp, then how can i choose the co ordinates to print values?
Is there any design tools to design rectangle fields to accept values?
because my PDF template have lots of fields to get values from user side.
tnx in adv.

There are two possibilities:
Your original PDF is a form:
You can check this by checking if the PDF has any fields as explained here: convert pdf editable fields into text using java programming
You'll need to adapt the Java code to C# code or you can use RUPS as shown in my answer to the question How to get specific types from AcroFields? Like PushButtonField, RadioCheckField, etc
In this case, filling out the form is easy:
PdfStamper pdfStamper = new PdfStamper(new PdfReader(templateFile), new FileStream(fileName, FileMode.Create));
AcroFields acroFields = pdfStamper.AcroFields;
acroFields.SetField(key, value);
pdfStamper.FormFlattening = true;
pdfStamper.Close();
You can have as many lines with SetField() as you want. In these lines key is the field name as defined in the original form; value is the value you want to add at the position(s) of that field.
The line with the pdfStamper.FormFlattening is optional. If you set that value to true, all interactivity will be removed: the form will no longer be a form. If you remove the line or set that value to false, then the form will still be a form. You'll be able to change the content of the fields and extract the value of the fields.
Your original PDF is not a form:
A PDF may look like a form to the human eye, but if it doesn't have AcroForm fields (and no XFA either), then a machine won't consider it as being a form. In this case, you have to understand that all the content is fixed at fixed coordinates on the page. You can add content at absolute positions, but the original content won't move.
There are different ways to add content to an existing PDF and they all involve PdfStamper. Once you have obtained PdfContentByte object from this PdfStamper then you can add text as explained in the documentation. Read the sections Manipulating existing PDFs and Absolute positioning of text or take a look at the content tagged with the keyword PdfStamper. The watermark examples should be interesting too.
I would advice not to use this second approach as it is very hard to find the exact coordinates to use. If your PDF isn't a form, turn it into a form using Adobe Acrobat and use the first approach. The first approach is much more future proof: if you ever have to change something in your form, you can change that form without having to change your code (provided that you preserve the original field names).

ItextSharp provides you to do the same, using pdfStamper class of ItextSharp.
Just a sample for your reference.
//create pdfreader instance and read content of existing PDF file into it, by providing it's path
PdfReader pdfReader = new PdfReader(FILE_PATH);
// create stamper instance to edit the exiting file
PdfStamper pdfStamper = new PdfStamper(pdfReader, Response.OutputStream);
// perform your edit operation here.....
.
.
.
// close pdfStamper instance
stamper.Close();

Related

Libre Office Labels don't show up as "AcroFields" in iTextSharp?

so I've been trying to generate a report. I've tried quite a few things already but there always seems to be problems. I'm currently trying iTextSharp 4.1.6.
My current strategy is to use LibreOffice to create a document with editable pdf fields, or I guess they are called "AcroFields". I'm not sure since I can't find a definition. But anyways, I assume that all of these are "AcroFields":
But if I put all of those into a form and export as pdf only some of them show up as AcroFields:
var reader = new PdfReader(File.ReadAllBytes("abc.pdf"));
foreach(var field in reader.AcroFields.Fields)
{
Console.WriteLine(((DictionaryEntry)field).Key);
}
> Text Box 1
Check Box 1
Numeric Field 1
Formatted Field 1
Date Field 1
List Box 1
Combo Box 1
Push Button 1
Option Button 1
Notice how Label Field 1 is not present. If it were present then doing a text replace might be easy. Except it's not present so it's looking like even iText can't do a simple text replace in a pdf. Is this true? How would you replace text in a pdf document using iTextSharp?
Notice how Label Field 1 is not present.
As there is no AcroForm form field type "label", form labels usually are drawn as regular page content in PDF files.
If it were present then doing a text replace might be easy. Except it's not present so it's looking like even iText can't do a simple text replace in a pdf. Is this true?
Indeed, in general there is no simple text replacement in a PDF.
How would you replace text in a pdf document using iTextSharp?
I would determine the bounding box coordinates of the text to replace using the iText text extraction feature with some extension that returns text plus coordinates. Then I'd remove that text by redaction using iText's PdfCleanUp... classes. Finally I'd add the replacement text as new text in the bounding box determined at start.
Unfortunately for you, both good text extraction and redaction are not present in your version 4.1.6; for this approach you should update at least to 5.5.x.
Alternatively, though, as you've been trying to generate a report, I assume the template design is in your hands. In that case you can put your labels into read-only text fields which you can change (they are read-only only to GUI users).

How to identify checkboxes in a flat pdf?

Team,
I have to validate a flattened pdf as part of a requirement. This pdf has checkboxes. I used Apache PDFBOX library to read the contents of this PDF. It is only reading the text but not identifying the checkboxes. Please find attached a screenshot of a similar pdf file that i am using Flat PDF with Checkbox :
Can you please provide me any approach to identify and validate these checkboxes
Code Snippet used
PDFTextStripper stripper = new PDFTextStripper() ;
PDDocument document = new PDDocument() ;
document = PDDocument.load(new File("D:\\test.pdf"));
stripper.setStartPage(1);
stripper.setEndPage(1);
stripper.setSortByPosition(true);
pdfTextContent = stripper.getText(document);
System.out.println(pdfTextContent);

Lost some text when extracting pdf

I've tried to get all the text on the page by using iText, but I have no idea why every coordinate text loses the last two character.
PdfDocument pdfDoc = new PdfDocument(new PdfReader(#"E:\Coding\COOR.pdf"));
LocationTextExtractionStrategy strategy = new LocationTextExtractionStrategy();
PdfCanvasProcessor parser = new PdfCanvasProcessor(strategy);
parser.ProcessPageContent(pdfDoc.GetFirstPage());
Console.Write(strategy.GetResultantText());
pdfDoc.Close();
Console.WriteLine("Great!");
Console.ReadKey();
You can also download my code from
https://1drv.ms/u/s!Al1hUSZtR4OjwU3XVBRQGneVaZlS
In short
The reason for that "lost text" is that the missing "text" isn't there to start with!
In detail
The contents of you PDF file are constructed in a misleading manner.
On the one hand there are very many path definitions which then are stroked (drawn). These drawings create what you can see in a viewer, both text and table lines.
On the other hand there are a few text drawing instructions to draw text using text rendering mode 3 which is... invisible! These drawings create the text you can copy&paste in a viewer or extract using iText.
Unfortunately the text in the text drawing instructions and the text drawn using paths does not match completely. The text you retrieve via copy&paste or text extraction, therefore, differs from your expectations.
Also the glyph sizes and positions are not exactly the same
To illustrate this I made the text drawing instructions use the normal (fill) text rendering mode. The top left corner which originally looks like this:
with that change looks like this:
As you see the formerly invisible text is only approximately at the same position as the visible drawings, and it is somewhat broken: The symbol for degrees is weirdly represented as "¡ã", and the longitude fractional seconds and the following symbol for seconds are missing.
To correctly extract the originally visible data, you'll need to use OCR instead of text extraction.

Actually cropping a PDF with PDF Clown

My objective is actually cropping a PDF file with PdfClown.
There are a lot of tools/library that allow cropping PDF, changing the PDF cropBox. This permits hiding contents outside a rectangular area, but content is still there, it might be accessed through a PDF parser and PDF size does not change.
On the contrary what I need is creating a new page containing only the contents inside the rectangular area.
So far I've tried scanning contents and selectively cloning them. But I didn't succeed yet. Any suggestions on using PdfClown for that?
I've seen someone is trying something similar with PdfBox Cropping a region from a PDF page with PDFBox not succeeding yet.
A bit late, but maybe it helps someone;
I am sucessfully doing what you are asking for - but with other libraries.
Required libraries : iText 4 or 5 and Ghostscript
Step 1 with pseudo code
Using iText, Create a PDFWRITER instance with a blank Doc. Open a PDFREADER object to the original file you want to crop. Import the Page, get a PDFTemplate Object from the source, set its .boundingBox property to the desired cropbox, wrap the template into an iText Image object and paste it onto the new page at an absolute position.
Dim reader As New PdfReader(sourcefile)
Dim doc As New Document()
Dim writer As PdfWriter = PdfWriter.GetInstance(doc, New System.IO.FileStream(outputfilename, System.IO.FileMode.Create))
//get the source page as an Imported Page
Dim page As PdfImportedPage = writer.GetImportedPage(reader, indexOfPageToGet) page
//create PDFTemplate Object at original size from source - see iText in Action book Page 91 for full details
Dim pdftemp As PdfTemplate = page.CreateTemplate(page.Width, page.Height)
//paste the original page onto the template object, see iText documentation what those parameters do (scaling, mirroring)
pdftemp.AddTemplate(page, 1, 0, 0, 1, 0, 0)
//now the critical part - set .boundingBox property on the template. This makes all objects outside the rectangle invisible
pdftemp.boundingBox = {iText Rectangle Structure with new Cropbox}
//template not needed anymore
writer.ReleaseTemplate(pdftemp)
//create an iText IMAGE object as wrapper to the template - with this img object absolute positionion on the final page is much easier
dim img as iTextSharp.Text.Image = Image.GetInstance(pdftemp)
// set img position
img.SetAbsolutePosition(x, y)
//set optional Rotation if needed
img.RotationDegrees = 0
//finally, this adds the actual content to the new document
doc.Add(img)
//cleanup
doc.Close()
reader.Close()
writer.Close()
The output file will visually look cropped. But the objects are still present in the PDF Stream. Filesize will probably remain very little changed yet.
Step 2:
Using Ghostscript and output device pdfwrite, combined with the correct command line parameters you can re-process the PDF from Step 1. This will give you a much smaller PDF. See Ghostscript documentation for the arguments https://www.ghostscript.com/doc/9.52/Use.htm
This steps actually gets rid of objects that are outside the bounding box - the requirement you asked for in your OP, at least for files that I deal with.
Optional Step 3:
Using MUTOOL with the -g option you can clean up unused XREF objects. Your original PDF probably had a lot of Xrefs, which increase filesize. After cropping some of them may not be needed anymore.
https://mupdf.com/docs/manual-mutool-clean.html
PDF Format is a tricky thing, normally I would agree with #Tilman Hausherr, my suggestion may not work for all files and covers the 'almost impossible' scenario, but it works for all cases that I deal with.

How to draw Matrix Code with PdfSharp?

I need to make a PDF report via PdfSharp. The report must include a QRCode, or data matrix code, but I can't seem to be able to draw it on the page.
The values it's asking for are value as String and length as Integer so here's what I'm doing:
Dim myNewCode As New PdfSharp.Drawing.BarCodes.CodeDataMatrix("1234567890", 10)
Then I try to draw it:
gfx.DrawMatrixCode(myNewCode, myXPoint)
It asks for an XPoint location so I set it to this:
Dim myXPoint As New XPoint(500,500)
Which only needs values for x and y.
It compiles OK but when I try to open the file I get the next error
An error exists on this page. Acrobat may not display the page correctly. Please contact the person who created the PDF document to correct the problem
My Acrobat version is 11.0.5, and there is no problem opening other PDF files which already contain these kind of codes.
Specify the size to get a correct PDF file:
var myXSize = new XSize(100, 100);
var myNewCode = new PdfSharp.Drawing.BarCodes.CodeDataMatrix("1234567890", 10, myXSize);
var myXPoint = new XPoint(200, 300);
gfx.DrawMatrixCode(myNewCode, myXPoint);
Please note that due to legal reasons, the open source version of PDFsharp does not include the implementation of the Data Matrix Code and shows dummy images instead.
Another option would be to use a 3rd party library (ZXing) to generate the QR Code bitmap and draw it as a bitmap with DrawImage() on the PDF.