How to add watermark on existing pdf file - watermark

I am trying to add watermark on pdf file using PdfSharp, I tried from this link
http://www.pdfsharp.net/wiki/Watermark-sample.ashx
but am not able to get how to get the existing pdf file page object and how to watermark on that page.
Help?

Basically, the samples are only snippets. You can download the source and with that you get a bunch of samples, including this watermark example.
The following comes from PDFSharp-MigraDocFoundation-1_32/PDFsharp/samples/Samples C#/Based on GDI+/Watermark/Program.cs
Quite simple, really ... I am only showing the code up to the for loop that goes over each page. You should have a look at the full file.
[...]
const string watermark = "PDFsharp";
const int emSize = 150;
// Get a fresh copy of the sample PDF file
const string filename = "Portable Document Format.pdf";
File.Copy(Path.Combine("../../../../../PDFs/", filename),
Path.Combine(Directory.GetCurrentDirectory(), filename), true);
// Create the font for drawing the watermark
XFont font = new XFont("Times New Roman", emSize, XFontStyle.BoldItalic);
// Open an existing document for editing and loop through its pages
PdfDocument document = PdfReader.Open(filename);
// Set version to PDF 1.4 (Acrobat 5) because we use transparency.
if (document.Version < 14)
document.Version = 14;
for (int idx = 0; idx < document.Pages.Count; idx++)
{
//if (idx == 1) break;
PdfPage page = document.Pages[idx];
[...]

Related

Add multi-page watermark to PDF

I have this merged PDF with Acroform only.
And I want add watermark to it.
I add watermark_1.pdf to page 1,3,5
I add watermark_2.pdf to page 2,4,6
And it works well with this code:
PDDocument doc = PDDocument.load(new File(filePath));
HashMap<Integer, String> overlayGuide = new HashMap<Integer, String>();
for (int i = 0; i < doc.getNumberOfPages(); i = i + 2) {
overlayGuide.put(i + 1, FOLDER + "watermark_1.pdf");
overlayGuide.put(i + 2, FOLDER + "watermark_2.pdf");
}
Overlay overlay = new Overlay();
overlay.setInputPDF(doc);
overlay.setOverlayPosition(Overlay.Position.BACKGROUND);
overlay.overlay(overlayGuide);
doc.save(new File(filePath));
But this way force me to separate my watermark into two file while my original watermark is into one file see watermark_2pages.pdf.
My question is it possible to configure overlay of PDFBox and use watermark_2pages.pdf.pdf and apply it to page 1 and 2 of my form directly (i.e. without having two watermark PDF file)?

Remove object in PDF with iTextSharp and save

This is a case of OCR gone wrong. I need to remove the hidden text from a PDF and I'm having a hard time figuring out how to do it.
The hidden text resides in an area always named /QuickPDFsomething which is under and /XObject dictionary that resides in the page's /Resources dictionary.
I have tried these two things and neither has worked so I'm clearly doing something wrong.
Option 1 - Kill obj - The PDF won't open in Acrobat and states, 'An error exists on this page. Acrobat may not display the page correctly' but it looks ok. Pitstop pukes with 'Critical parser failure: XObject resource missing'.
PdfReader.KillIndirect(obj);
oPdfFile.GetPdfReader().RemoveUnusedObjects();
var stamper = new PdfStamper(oPdfFile.GetPdfReader(), new FileStream(#"C:\temp.pdf", FileMode.Create));
stamper.Close();
Option 2 - CleanupProcessor - Throws an exception about 'A Graphics object cannot be created from an image that has an indexed pixel format'.
var stamper = new PdfStamper(oPdfFile.GetPdfReader(), new FileStream(#"C:\temp.pdf", FileMode.Create));
var cleanupLocations = new List<PdfCleanUpLocation>();
var pageRect = oPdfFile.GetPdfReader().GetCropBox(1);
cleanupLocations.Add(new PdfCleanUpLocation(1, pageRect));
PdfCleanUpProcessor cleaner = new PdfCleanUpProcessor(cleanupLocations, stamper);
cleaner.CleanUp();
stamper.Close();
I'd like to remove the /QuickPDF object (41 0 R, in this image) as well as remove it from the content stream that calls it with /QuickPDF Do.
Unfortunately I cannot provide the PDF.
Any tips on how to do this?
I hate to answer my own question but I wanted to share the solution I found in case others need it.
After playing around with this for a couple days i figured out that Option 1 above would indeed remove the object and that the exception that I was getting from PitStop was because the content stream had a reference to the /QuickPDF XObject.
So I tried following #mkl's solution here Removing Watermark from PDF iTextSharp but it kept putting unwanted data in the content stream that rotated my PDF.
So then I found #Chris's solution here Removing Watermark from a PDF using iTextSharp and it seems to work although I'm not sure how stable this solution will be.
This is my solution for removing /QuickPDF from the content stream:
int numPages = oPdfFile.GetPdfReader().NumberOfPages;
int pgNumber = 1;
PdfDictionary page = oPdfFile.GetPdfReader().GetPageN(pgNumber);
PdfArray contentarray = page.GetAsArray(PdfName.CONTENTS);
PRStream stream;
string content;
if (contentarray != null)
{
//Loop through content
for (int j = 0; j < contentarray.Size; j++)
{
stream = (PRStream)contentarray.GetAsStream(j);
content = Encoding.ASCII.GetString(PdfReader.GetStreamBytes(stream));
string[] tokens = content.Split('\n');
for (int i = 0; i< tokens.Length; i++)
{
if (tokens[i].Contains("/QuickPDF"))
{
tokens[i] = string.Empty;
}
}
string outstr = string.Join("\n", tokens.Select(p => p).ToArray());
byte[] outbytes = Encoding.ASCII.GetBytes(outstr);
stream.SetData(outbytes);
}
}

Hyperlink Detection from PDF

I have some PDFs containing Hyperlinks both in form of URL and mailto. Now Is there any way or tool(may be 3rd party) to extract the Hyperlink meta information form the PDF like coordinates, link type and destination address. Any help is highly appreciated.
I have already tried with iText and PDFBox but with no major success, even some third party software are not providing me the desired output.
I have tried the following code in Java using iText
PdfReader myReader = new PdfReader("pdf File Path");
PdfDictionary pageDict = myReader.getPageN(1);
PdfArray annots = pageDict.getAsArray(PdfName.ANNOTS);
System.out.println(annots);
ArrayList<String> dests = new ArrayList<String>();
if(annots != null)
{
for(int i=0; i<annots.size(); ++i)
{
PdfDictionary annotDict = annots.getAsDict(i);
PdfName subType = annotDict.getAsName(PdfName.SUBTYPE);
if (subType != null && PdfName.LINK.equals(subType))
{
PdfDictionary action = annotDict.getAsDict(PdfName.A);
if(action != null && PdfName.URI.equals(action.getAsName(PdfName.S)))
{
dests.add(action.getAsString(PdfName.URI).toString());
} // else { its an internal link }
}
}
}
System.out.println(dests);
You can use Docotic.Pdf library for links extraction (disclaimer: I work for the company).
Below is the code that opens specified file, finds all hyperlinks, collects information about position of each link and draws rectangle around each links.
After that the code creates new PDF (with links in rectangles) and a text file with collected information. In the end, both created files are opened in default viewers.
public static void ListAndHighlightLinks(string inputFile, string outputFile, string outputTxt)
{
using (PdfDocument doc = new PdfDocument(inputFile))
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < doc.Pages.Count; i++)
{
PdfPage page = doc.Pages[i];
foreach (PdfWidget widget in page.Widgets)
{
PdfActionArea actionArea = widget as PdfActionArea;
if (actionArea == null)
continue;
PdfUriAction linkAction = actionArea.Action as PdfUriAction;
if (linkAction == null)
continue;
Uri url = linkAction.Uri;
PdfRectangle rect = actionArea.BoundingBox;
// add information about found link into string buffer
sb.Append("Page ");
sb.Append(i.ToString());
sb.Append(" : ");
sb.Append(rect.ToString());
sb.Append(" ");
sb.AppendLine(url.ToString());
// draw rectangle around found link
page.Canvas.DrawRectangle(rect);
}
}
// save document with highlighted links and text information about links to files
doc.Save(outputFile);
System.IO.File.WriteAllText(outputTxt, sb.ToString());
// open created PDF and text file in default viewers
System.Diagnostics.Process.Start(outputTxt);
System.Diagnostics.Process.Start(outputFile);
}
}
You can use the sample code with a call like this:
ListAndHighlightLinks("input.pdf", "output.pdf", "links.txt");
if your pdfs are copy protected, you need to start with step 1, if they're free to copy, you can start with step 2
step 1: convert your pdfs into word .doc: use Adobe Acrobat Pro or an online pdf to word converter:
http://www.pdfonline.com/pdf2word/index.asp
step 2: copy-paste the whole document into the input window here, you can also download the lightweight html tool:
http://www.surf7.net/services/value-added-services/free-web-tools/email-extractor-lite/
select 'url' as 'Type of address to extract', select your separator, hit extract and that's it.
Hope it works cheers.
One possibility would be using a custom JavaScript in Acrobat, which would enumerate the "words" on the page and then read out their Quads. From that you get the coordinates to create a link (or to compare with the links on the page), as well as the actual text (that's the "word(s)".
If it is "only" to set the border of the existing links, you also do another Acrobat JavaScript which enumerates the links of the document, and set their border color property (and you may need to set the width as well).
(if you prefer "buy" over "make" feel free to contact me in private; such things are part of my standard "repertoire").

Use PDFBox to Merge Pages?

I know I can use PDFBox to merge multiple PDF's into one PDF. But is there a way to merge pages? For example, I have a header in PDF and want it to be inserted to the top of the first page of the combined PDF and push everything down. Is there a way to do it using PDFBox API?
Here is some code that works to copy two files into a merged one with multiple copies of each one. It copies by pages. It's something I got using the information in the answer to this question: Can duplicating a pdf with PDFBox be small like with iText?
So all you have to do is to make one copy only of the first page of doc1 and one copy only of all pages of doc2. There's a comment where you'll have to make a change to leave off some pages.
final int COPIES = 1; // total copies
// Same code as linked answer mostly
PDDocument samplePdf = new PDDocument();
InputStream in1 = this.getClass().getResourceAsStream(DOC1_NAME);
PDDocument doc1 = PDDocument.load(in1);
List<PDPage> pages = (List<PDPage>) doc1.getDocumentCatalog().getAllPages();
// *** Change this loop to only copy the pages you want from DOC1
for (PDPage page : pages) {
for (int i = 0; i < COPIES; i++) { // loop for each additional copy
samplePdf.importPage(page);
}
}
// Same code again mostly
InputStream in2 = this.getClass().getResourceAsStream(DOC2_NAME);
PDDocument doc2 = PDDocument.load(in2);
pages = (List<PDPage>) doc2.getDocumentCatalog().getAllPages();
for (PDPage page : pages) {
for (int i = 0; i < COPIES; i++) { // loop for each additional copy
samplePdf.importPage(page);
}
}
// Then write the results out
File output = new File(OUT_NAME);
FileOutputStream out = new FileOutputStream(output);
samplePdf.save(out);
samplePDF.close();
in1.close();
doc1.close();
in2.close();
doc2.close();

Insert PDF in PDF (NOT merging files)

I'd like to insert a PDF page in another PDF page scaled. I'd like to use iTextSharp for this.
I have a vector drawing which can be exported as a single page PDF file. I would like to add this file into a page of other PDF document just like I would add an image to a PDF document.
Is this possible?
The purpose of this is to retain the ability to zoom in without losing quality.
It is very hard to reproduce the vector drawing using PDF vectors because it is an extremely complex drawing.
Exporting the vector drawing as high resolution image is not an option since I have to use a lot of them in a single PDF document. The final PDF would be very large and its writing too slow.
This is relatively easy to do although there's a couple of ways to go about it. If you're creating a new document that has the other documents inside of it and nothing else then the easiest thing to use is probably the PdfWriter.GetImportedPage(PdfReader, Int). This will give you a PdfImportedPage (which inherits from PdfTemplate). Once you have that you can add it to your new document by using PdfWriter.DirectContent.AddTemplate(PdfImportedPage, Matrix).
There's a couple of overloads to AddTemplate() but the easiest one (at least for me) is the one that takes a System.Drawing.Drawing2D.Matrix. If you use this you can easily scale and translate (change x,y) without having to think in "matrix" terms.
Below is sample code that shows this off. It targets iTextSharp 5.4.0 although it should work pretty much the same with 4.1.6 if you remove the using statements. It first creates a sample PDF with 12 pages with random background colors. Then it creates a second document and adds each page from the first PDF scaled by 50% so that 4 old pages fit onto 1 new page. See the code comments for further details. This code assumes that all pages are the same size, you might need to perform further calculations if your situation differs.
//Test files that we'll be creating
var file1 = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "File1.pdf");
var file2 = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "File2.pdf");
//For test purposes we'll fill the pages with a random background color
var R = new Random();
//Standard PDF creation, nothing special here
using (var fs = new FileStream(file1, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (var doc = new Document()) {
using (var writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
//Create 12 pages with text on each one
for (int i = 1; i <= 12; i++) {
doc.NewPage();
//For test purposes fill the page with a random background color
var cb = writer.DirectContentUnder;
cb.SaveState();
cb.SetColorFill(new BaseColor(R.Next(0, 256), R.Next(0, 256), R.Next(0, 256)));
cb.Rectangle(0, 0, doc.PageSize.Width, doc.PageSize.Height);
cb.Fill();
cb.RestoreState();
//Add some text to the page
doc.Add(new Paragraph("This is page " + i.ToString()));
}
doc.Close();
}
}
}
//Create our combined file
using (var fs = new FileStream(file2, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (var doc = new Document()) {
using (var writer = PdfWriter.GetInstance(doc, fs)) {
//Bind a reader to the file that we created above
using (var reader = new PdfReader(file1)) {
doc.Open();
//Get the number of pages in the original file
int pageCount = reader.NumberOfPages;
//Loop through each page
for (int i = 0; i < pageCount; i++) {
//We're putting four original pages on one new page so add a new page every four pages
if (i % 4 == 0) {
doc.NewPage();
}
//Get a page from the reader (remember that PdfReader pages are one-based)
var imp = writer.GetImportedPage(reader, (i + 1));
//A transform matrix is an easier way of dealing with changing dimension and coordinates on an rectangle
var tm = new System.Drawing.Drawing2D.Matrix();
//Scale the image by half
tm.Scale(0.5f, 0.5f);
//PDF coordinates put 0,0 in the bottom left corner.
if (i % 4 == 0) {
tm.Translate(0, doc.PageSize.Height); //The first item on the page needs to be moved up "one square"
} else if (i % 4 == 1) {
tm.Translate(doc.PageSize.Width, doc.PageSize.Height); //The second needs to be moved up and over
} else if (i % 4 == 2) {
//Nothing needs to be done for the third
} else if (i % 4 == 3) {
tm.Translate(doc.PageSize.Width, 0); //The fourth needs to be moved over
}
//Add our imported page using the matrix that we set above
writer.DirectContent.AddTemplate(imp,tm);
}
doc.Close();
}
}
}
}
In addition; while i was trying to add a rotated pdf to a rotated pdf, i got some rotation problems. Kind of confusing but you should check the "PdfImportedPage.Rotation" of the page which is gonna be added to pdf.
PdfImportedPage page;//page = writer.GetImportedPage(PdfReader reader, int pageNum);
PdfContentByte pcb;//pcb = PdfWriter.DirectContentUnder;
//create matrix to use for rotating imported page
Matrix matrix = new Matrix(a, b, c, d, e, f);
matrix.Rotate(-(page.Rotation));
if (page.Rotation != 0)
pcb.AddTemplate(page, matrix, true);
else
pcb.AddTemplate(page, a, b, c, d, e, f, true);
code looks like silly but i want to get your attention on "matrix.Rotate(negative rotation of imported page)"