I am trying to crop left side of pdf to 10 mm. i used below code
public void TrimLeft(string sourceFilePath, string outputFilePath)
{
PdfReader pdfReader = new PdfReader(sourceFilePath);
float width =(float) GetPDFwidth(sourceFilePath);
float height = (float)GetPDFHeight(sourceFilePath);
float widthTo_Trim = iTextSharp.text.Utilities.MillimetersToPoints(10);
// Set which part of the source document will be copied.
// PdfRectangel(bottom-left-x, bottom-left-y, upper-right-x, upper-right-y)
PdfRectangle rect = new PdfRectangle(0, 0, width - widthTo_Trim, height);
PdfRectangle rectLeftside = new PdfRectangle(0,0,width - widthTo_Trim, height);
using (var output = new FileStream(outputFilePath, FileMode.CreateNew, FileAccess.Write))
{
// Create a new document
Document doc = new Document();
// Make a copy of the document
PdfSmartCopy smartCopy = new PdfSmartCopy(doc, output);
// Open the newly created document
doc.Open();
// Loop through all pages of the source document
for (int i = 1; i <= pdfReader.NumberOfPages; i++)
{
// Get a page
var page = pdfReader.GetPageN(i);
// Apply the rectangle filter we created
page.Put(PdfName.CROPBOX, rectLeftside);
page.Put(PdfName.MEDIABOX, rectLeftside);
// Copy the content and insert into the new document
var copiedPage = smartCopy.GetImportedPage(pdfReader, i);
smartCopy.AddPage(copiedPage);
}
// Close the output document
doc.Close();
}
}
Its croping RHS of pdf. i tried with changing the coordinates
PdfRectangle rectLeftside = new PdfRectangle(0,0,width - widthTo_Trim, height);
but unable to get desired result.
How can i crop X mm left side
Making the hint in the comments an actual answer...
You create the new crop box rectangle like this:
PdfRectangle rectLeftside = new PdfRectangle(0,0,width - widthTo_Trim, height);
The constructor in question is:
/**
* Constructs a <CODE>PdfRectangle</CODE>-object.
*
* #param llx lower left x
* #param lly lower left y
* #param urx upper right x
* #param ury upper right y
*/
...
public PdfRectangle(float llx, float lly, float urx, float ury)
Thus, assuming your original PDF has a crop box with lower left coordinates (0,0), your code manipulates the upper right x, i.e. the right side of the box. You, on the other hand, actually want to manipulate the left side. Thus, you should use something like:
PdfRectangle rectLeftside = new PdfRectangle(widthTo_Trim, 0, width, height);
After the hint in comments, this also was the OP's solution.
Additional improvements
Using a PdfStamper
The OP uses a PdfSmartCopy instance and its method GetImportedPage to crop left side of pdf. While this already is better than the use of a plain PdfWriter for this task, the best choice for manipulating a single PDF usually is a PdfStamper: You don't have to copy anything anymore, you merely apply the changes. Furthermore the result internally is more like the original.
Determining boxes page by page
The OP in his code assumes
a constant size of all pages in the PDF (which he determines in his methods GetPDFwidth and GetPDFHeight) and
constant lower left coordinates (0,0) of the current crop box on all pages.
Neither of these assumptions is true for all PDFs. Thus, one should retrieve and manipulate the crop box of each page separately.
The code
public void TrimLeftImproved(string sourceFilePath, string outputFilePath)
{
PdfReader pdfReader = new PdfReader(sourceFilePath);
float widthTo_Trim = iTextSharp.text.Utilities.MillimetersToPoints(10);
using (FileStream output = new FileStream(outputFilePath, FileMode.Create, FileAccess.Write))
using (PdfStamper pdfStamper = new PdfStamper(pdfReader, output))
{
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
Rectangle cropBox = pdfReader.GetCropBox(page);
cropBox.Left += widthTo_Trim;
pdfReader.GetPageN(page).Put(PdfName.CROPBOX, new PdfRectangle(cropBox));
}
}
}
Related
So... I've been trying to use the example provided in the documentation of itext for merging documents and creating a TOC for the merged result. But the part that adds page number text to every page isn't working as I would expect. What happens is that the text added gets flipped over some horizontal axis as shown in the next picture:
Also, the java doc for the method used to set a fixed position to the added text (public T setFixedPosition(int pageNumber, float left, float bottom, float width)) doesn't make sense to me:
Sets values for a absolute repositioning of the Element. The coordinates specified correspond to the bottom-left corner of the element and it grows upwards.
But when I run setFixedPosition(pageNumber, 0, 0, 50) the text ends up in the upper left corner, again also flipped. And if I use the width and height from the page size of the source PdfDocument as parameters for left and bottom positions respectively it doesn't even reach bottom right corner.
I might be doing something wrong or misunderstanding something. Either way, here is the code I'm using:
private static int copyPdfPages(PdfDocument source, Document document, Integer start, Integer pages, Integer number) {
int oldC;
int max = start + pages - 1;
Text text;
for (oldC = start; oldC <= max; oldC++) {
text = new Text(String.format("Page %d", number));
PageSize pageSize = source.getDefaultPageSize();
source.copyPagesTo(oldC, oldC, document.getPdfDocument());
document.add(new Paragraph(text).setBorder(new SolidBorder(ColorConstants.RED, 1))
.setFixedPosition(number++, pageSize.getWidth() - 55, pageSize.getHeight() - 30, 50));
}
return oldC - start;
}
public static void main(String[] args) throws IOException {
String path = "/path/to/target";
FileOutputStream fos = new FileOutputStream(path);
PdfDocument pdfDocTgt = new PdfDocument(new PdfWriter(fos));
Document document = new Document(pdfDocTgt);
PdfDocument pdfDocSrc = new PdfDocument(new PdfReader(new FileInputStream("path/to/source")));
copyPdfPages(pdfDocSrc, document, 1, pdfDocSrc.getNumberOfPages(), 1);
pdfDocTgt.close();
pdfDocSrc.close();
document.flush();
document.flush();
fos.flush();
fos.close();
}
And here is the pdf source: https://drive.google.com/open?id=11_9ptuoRqS91hI3fDcs2FRsIUEiX0a84
Help please (and sorry about my english).
The problem
The problem is that Document.add assumes that the instructions in the current content of the current page at its end have the graphics state essentially restored to its initial state (or else that the effects of the differences on the output are desired).
In your sample PDF this assumption is not satisfied, in particular the page content instructions start with
0.750000 0.000000 0.000000 -0.750000 0.000000 841.920044 cm
which changes the current transformation matrix to
scale everything down to 75% and
flip the coordinate system vertically.
The former change causes your addition to not be in a page corner but instead instead somewhere more to the center; the latter causes it to be vertically mirrored and more to the bottom instead of to the top of the page.
The fix
If one does not know whether the current contents of the page have an essentially restored graphics state at the end (usually the case if one processes page contents one has not generated oneself), one should refrain from adding content via a Document instance but instead use a PdfCanvas generated with a constructor that wraps the current page content in a save-graphics-state ... restore-graphics-state envelop.
E.g. for your task:
private static int copyPdfPagesFixed(PdfDocument source, PdfDocument target, int start, int pages, int number) {
int oldC;
int max = start + pages - 1;
Text text;
for (oldC = start; oldC <= max; oldC++) {
text = new Text(String.format("Page %d", number));
source.copyPagesTo(oldC, oldC, target);
PdfPage newPage = target.getLastPage();
Rectangle pageSize = newPage.getCropBox();
try ( Canvas canvas = new Canvas(new PdfCanvas(newPage, true), target, pageSize) ) {
canvas.add(new Paragraph(text).setBorder(new SolidBorder(ColorConstants.RED, 1))
.setFixedPosition(number++, pageSize.getWidth() - 55, pageSize.getHeight() - 30, 50));
}
}
return oldC - start;
}
(AddPagenumberToCopy method)
The PdfCanvas constructor used above is documented as
/**
* Convenience method for fast PdfCanvas creation by a certain page.
*
* #param page page to create canvas from.
* #param wrapOldContent true to wrap all old content streams into q/Q operators so that the state of old
* content streams would not affect the new one
*/
public PdfCanvas(PdfPage page, boolean wrapOldContent)
Used like this
try ( PdfDocument pdfDocSrc = new PdfDocument(new PdfReader(SOURCE));
PdfDocument pdfDocTgt = new PdfDocument(new PdfWriter(TARGET)) ) {
copyPdfPagesFixed(pdfDocSrc, pdfDocTgt, 1, pdfDocSrc.getNumberOfPages(), 1);
}
(AddPagenumberToCopy test testLikeAibanezFixed)
the top of the first result page looks like this:
Using C# and Winforms, I want to display a PDF, select a rectangular region, and then extract that area of text from a number of PDFs. For displaying the PDF, I have a number of options...
Use an "Adobe PDF Reader" control to display the PDF - However, I cant use mouseover events and according to https://forums.adobe.com/thread/1640606 its just not possible to select a region.
Use a "WebBrowser" control to display the PDF, but it appears I have the same issue with mouseover events and cannot select a region.
Convert the PDF to an image (using ghostscript in my case) and displaying it in a picturebox. I'm finding the most success here, as I can now generate and record the coordinates of a rectangular region. When I take these coordinates and apply them to the PDF using Itext, I don't think my rectangular region translates correctly.
My question is, How do I render the GhostScripted image in a picture box maintaining the same ratios so that my coordinates will line up with the PDF?
Thank you in advance for the down votes!!
Here is the current state of my code... Everything works with the exception that my units are off in space somewhere. The action DOES return text, but it's never the text I selected. Im sure its a combination of the coordinate system / units and I will continue to try to understand this.
---- update
With a PDF at 0 deg rotation (portrait), I think the following holds true, or is at least working for me right now... User Units having not been changed, the coordinates taken from selecting in the picturebox need adjusting. The Y coordinates need to be subtracted from the overall height while the X coordinate remains the same.
iTextSharp.text.Rectangle rect = new iTextSharp.text.Rectangle(first.X, 3024-first.Y, last.X, 3024-last.Y);
This is picking text up exactly as expected on 0 deg rotated PDFs. On 90 deg rotated PDFs, the X and Y coordinates just need to be swapped.I am updating the code snippet below to show my working example.
using System;
using System.Drawing;
using System.Windows.Forms;
using Ghostscript.NET.Rasterizer;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
namespace formPdf
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
string fileName; // The filename of the pdf
float width; // The width of the PDF in pixels
float hight; // the Height of the PDF in pixels
float rotation; // the Rotation of the PDF 0 or 90
float llx = 0; // The Lower Left X value for applying to the PDF
float lly = 0; // the Lower Left Y value for applying to the PDF
float urx = 0; // the Upper Right X value for applying to the PDF
float ury = 0; // the Upper Right Y value for applying to the PDF
// OnCLick event to open the file browser and select a file... The Width, Height and rotation values are set and the program
// is directed to render the First page of the pdf by calling the setPicture function
private void button1_Click(object sender, EventArgs e)
{
OpenFileDialog openFileDialog1 = new OpenFileDialog();
if (openFileDialog1.ShowDialog() == DialogResult.OK)
{
try
{
fileName = openFileDialog1.FileName;
PdfReader reader = new PdfReader(fileName);
iTextSharp.text.Rectangle dim = reader.GetPageSizeWithRotation(1);
width = dim.Width;
hight = dim.Height;
rotation = dim.Rotation;
setPicture(openFileDialog1.FileName);
} catch
{
// do nothing for now
}
}
}
// Using Ghostscript, the image is rendered to a picturebox. DPIs are set assuming the PDF default value is used
private void setPicture(string fileName)
{
GhostscriptRasterizer rasterizer = new GhostscriptRasterizer();
rasterizer.Open(fileName);
Image img = rasterizer.GetPage(72, 72, 1);
pictureBox1.SizeMode = PictureBoxSizeMode.AutoSize;
pictureBox1.Image = img;
}
// Declare point variables for the user defined rectangle indicating the locatoin of the PDF to be searched...
Point first = new Point();
Point last = new Point();
// The first point is collected on the MouseDown event
private void pictureBox1_MouseDown(object sender, MouseEventArgs e)
{
first = e.Location;
}
// The second point is collected on the mouse down event. Points to be applied to the PDF are adjusted based on the rotation of the PDF.
private void pictureBox1_MouseUp(object sender, MouseEventArgs e)
{
last = e.Location;
if (rotation == 0)
{
llx = first.X;
lly = hight - first.Y;
urx = last.X;
ury = hight - last.Y;
} else if(rotation == 90) {
llx = first.Y;
lly = first.X;
urx = last.Y;
ury = last.X;
}
gettext();
}
// the original PDF is opened with Itext and the text is extracted from t he defined location...
private void gettext()
{
PdfReader reader = new PdfReader(fileName);
iTextSharp.text.Rectangle rect = new iTextSharp.text.Rectangle(llx, lly, urx, ury);
RenderFilter[] renderfilter = new RenderFilter[1];
renderfilter[0] = new RegionTextRenderFilter(rect);
ITextExtractionStrategy textExtractionStrategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), renderfilter);
string text = PdfTextExtractor.GetTextFromPage(reader, 1, textExtractionStrategy);
iTextSharp.text.Rectangle mediabox = reader.GetPageSizeWithRotation(1);
MessageBox.Show("", text+" "+mediabox+" "+first+" "+last);
}
// Image Controls....
}
}
I'm using iTextSharp to add a watermark to existing PDF files. When viewing these files on screen, the watermark and PDF appears correctly. However, when printing on certain printers (2 out of 3 that have been tested by the client), the watermark seems to interfere with the content that is above it causing it to appear malformed when printed.
The PDF's are of CAD drawings (i.e. electrical circuit drawings etc.)
The code being used to apply the watermark is below. The PdfContentByte is retrieved by a call to GetOverContent from the class PdfStamper
private void AddWaterMark(PdfContentByte dc, string text, BaseFont font, float fontSize, float angle, BaseColor color, Rectangle realPageSize, Rectangle rect = null)
{
var gstate = new PdfGState { FillOpacity = 0.1f, StrokeOpacity = 0.3f };
dc.SaveState();
dc.SetGState(gstate);
dc.SetColorFill(color);
dc.BeginText();
dc.SetFontAndSize(font, fontSize);
var ps = rect ?? realPageSize; /*dc.PdfDocument.PageSize is not always correct*/
var x = (ps.Right + ps.Left) / 2;
var y = (ps.Bottom + ps.Top) / 2;
dc.ShowTextAligned(Element.ALIGN_CENTER, text, x, y, angle);
dc.EndText();
dc.RestoreState();
}
Here is an example of what it looks like on screen:
Here is what it looks like when printed on some of the printers:
Can anyone give me any idea on why certain printers will not print the PDF correctly and if I can change the way I apply the watermarks to avoid this issue?
PDF stamping works for nearly every document I have tried. However, a client scanned some pages and his computer generated a PDF document that is resistant to stamping. The embedded image files are in JBIG2 format, but I am not sure if that is important. I have debugged the PDF with Apache's pdfbox, and I can see the text is embedded. It just doesn't show up.
Here is the PDF that won't stamp: http://demo.clearvillageinc.com/plans.pdf
And my code:
static void Main(string[] args) {
string stamp = "<div style=\"color:#F00;\">Reviewed for Code Compliance</div>";
string fileName = #"C:\temp\source.pdf";
string outputFileName = #"C:\temp\source-output.pdf";
// Open a destination stream.
using (var destStream = new System.IO.MemoryStream()) {
using (var sourceReader = new PdfReader(fileName)) {
// Convert the HTML into a stamp.
using (var stampData = FromHtml(stamp)) {
using (var stampReader = new PdfReader(stampData)) {
using (var stamper = new PdfStamper(sourceReader, destStream)) {
stamper.Writer.CloseStream = false;
// Add the stamp stream to the source document.
var stampPage = stamper.GetImportedPage(stampReader, 1);
// Process all of the pages in the source document.
for (int i = 1; i <= sourceReader.NumberOfPages; i++) {
var canvas = stamper.GetOverContent(i);
canvas.AddTemplate(stampPage, 0, -50);
}
}
}
}
}
// Finished. Save the file.
using (var fs = new System.IO.FileStream(outputFileName, FileMode.Create)) {
destStream.Position = 0;
destStream.CopyTo(fs);
}
}
}
public static System.IO.Stream FromHtml(string html) {
var ms = new System.IO.MemoryStream();
// Convert html to pdf.
using (var document = new iTextSharp.text.Document()) {
var writer = iTextSharp.text.pdf.PdfWriter.GetInstance(document, ms);
writer.CloseStream = false;
document.Open();
using (var sr = new System.IO.StringReader(html)) {
XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, sr);
}
}
ms.Position = 0; // Reset for reading.
return ms;
}
One part of a page definition is the "MediaBox" which controls the page's size. This property takes two locations that specify the coordinates of two opposite corners of a rectangle. Although not required, most PDFs specify the lower left corner first followed by the upper right corner. Also, most PDF use 0x0 for the lower left and then whatever the page's width and height for the top corner. So an 8.5x11 inch PDF would be 0,0 and 612,792 (8.5 * 72 = 612 and 11 * 72 = 792) and this would be written as 0,0,612,792.
Your scanned PDF, however, has for whatever reason decided to treat 0,7072 as the lower left corner and 614,7864 as the top right corner. That still gives us (almost) an 8.5x11 page size but if you try to draw something at 0,0 it will be 7,072 pixels below the actual page. You can see this in Acrobat Pro by zooming out very far (1% for me), picking Tools, Edit Object and then doing a Select All. You should see something way far down selected, too.
To get around this, you need to respect the page's boundaries.
for (int i = 1; i <= sourceReader.NumberOfPages; i++) {
//Get the page to be stamped
var pageToBeStamped = sourceReader.GetPageSize(i);
var canvas = stamper.GetOverContent(i);
//Offset our new page by 50 pixels off of the destination page's bottom
canvas.AddTemplate(stampPage, pageToBeStamped.Left, pageToBeStamped.Bottom - 50);
}
The code above gets the rectangle for the imported page and uses bottom offset by 50 pixels (from your original code). Also, although not a problem in your case, we use the imported page's actual left edge instead of just zero.
This code can still break, however. The math in the first paragraph uses 72 which is the default for PDFs but this can be changed. Most people don't change it but most people also don't change 0,0. Currently your -50 assumes the 72 which gives the visual perception of moving the stamp about seven-tenths of an inch from the top edge. If you run into this scenario you'll want to look into retrieving the user unit.
Also, as I said in the first paragraph, most applications use lower left upper right but this isn't a hard rule. Someone could specify upper right and bottom left or even top left and bottom right. This is a hard one to take into account but it is something that you should at least be aware of.
I'd like to insert a PDF page in another PDF page scaled. I'd like to use iTextSharp for this.
I have a vector drawing which can be exported as a single page PDF file. I would like to add this file into a page of other PDF document just like I would add an image to a PDF document.
Is this possible?
The purpose of this is to retain the ability to zoom in without losing quality.
It is very hard to reproduce the vector drawing using PDF vectors because it is an extremely complex drawing.
Exporting the vector drawing as high resolution image is not an option since I have to use a lot of them in a single PDF document. The final PDF would be very large and its writing too slow.
This is relatively easy to do although there's a couple of ways to go about it. If you're creating a new document that has the other documents inside of it and nothing else then the easiest thing to use is probably the PdfWriter.GetImportedPage(PdfReader, Int). This will give you a PdfImportedPage (which inherits from PdfTemplate). Once you have that you can add it to your new document by using PdfWriter.DirectContent.AddTemplate(PdfImportedPage, Matrix).
There's a couple of overloads to AddTemplate() but the easiest one (at least for me) is the one that takes a System.Drawing.Drawing2D.Matrix. If you use this you can easily scale and translate (change x,y) without having to think in "matrix" terms.
Below is sample code that shows this off. It targets iTextSharp 5.4.0 although it should work pretty much the same with 4.1.6 if you remove the using statements. It first creates a sample PDF with 12 pages with random background colors. Then it creates a second document and adds each page from the first PDF scaled by 50% so that 4 old pages fit onto 1 new page. See the code comments for further details. This code assumes that all pages are the same size, you might need to perform further calculations if your situation differs.
//Test files that we'll be creating
var file1 = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "File1.pdf");
var file2 = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "File2.pdf");
//For test purposes we'll fill the pages with a random background color
var R = new Random();
//Standard PDF creation, nothing special here
using (var fs = new FileStream(file1, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (var doc = new Document()) {
using (var writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
//Create 12 pages with text on each one
for (int i = 1; i <= 12; i++) {
doc.NewPage();
//For test purposes fill the page with a random background color
var cb = writer.DirectContentUnder;
cb.SaveState();
cb.SetColorFill(new BaseColor(R.Next(0, 256), R.Next(0, 256), R.Next(0, 256)));
cb.Rectangle(0, 0, doc.PageSize.Width, doc.PageSize.Height);
cb.Fill();
cb.RestoreState();
//Add some text to the page
doc.Add(new Paragraph("This is page " + i.ToString()));
}
doc.Close();
}
}
}
//Create our combined file
using (var fs = new FileStream(file2, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (var doc = new Document()) {
using (var writer = PdfWriter.GetInstance(doc, fs)) {
//Bind a reader to the file that we created above
using (var reader = new PdfReader(file1)) {
doc.Open();
//Get the number of pages in the original file
int pageCount = reader.NumberOfPages;
//Loop through each page
for (int i = 0; i < pageCount; i++) {
//We're putting four original pages on one new page so add a new page every four pages
if (i % 4 == 0) {
doc.NewPage();
}
//Get a page from the reader (remember that PdfReader pages are one-based)
var imp = writer.GetImportedPage(reader, (i + 1));
//A transform matrix is an easier way of dealing with changing dimension and coordinates on an rectangle
var tm = new System.Drawing.Drawing2D.Matrix();
//Scale the image by half
tm.Scale(0.5f, 0.5f);
//PDF coordinates put 0,0 in the bottom left corner.
if (i % 4 == 0) {
tm.Translate(0, doc.PageSize.Height); //The first item on the page needs to be moved up "one square"
} else if (i % 4 == 1) {
tm.Translate(doc.PageSize.Width, doc.PageSize.Height); //The second needs to be moved up and over
} else if (i % 4 == 2) {
//Nothing needs to be done for the third
} else if (i % 4 == 3) {
tm.Translate(doc.PageSize.Width, 0); //The fourth needs to be moved over
}
//Add our imported page using the matrix that we set above
writer.DirectContent.AddTemplate(imp,tm);
}
doc.Close();
}
}
}
}
In addition; while i was trying to add a rotated pdf to a rotated pdf, i got some rotation problems. Kind of confusing but you should check the "PdfImportedPage.Rotation" of the page which is gonna be added to pdf.
PdfImportedPage page;//page = writer.GetImportedPage(PdfReader reader, int pageNum);
PdfContentByte pcb;//pcb = PdfWriter.DirectContentUnder;
//create matrix to use for rotating imported page
Matrix matrix = new Matrix(a, b, c, d, e, f);
matrix.Rotate(-(page.Rotation));
if (page.Rotation != 0)
pcb.AddTemplate(page, matrix, true);
else
pcb.AddTemplate(page, a, b, c, d, e, f, true);
code looks like silly but i want to get your attention on "matrix.Rotate(negative rotation of imported page)"