what will be the alternative for Pdfstamper in itext7? - asp.net-core

I tried to find the alternative for Pdfstamper in itext7 but didn't get how to use? I've already implemented code in itextshap its working but not in itext7.
I've one more doubt what will be the alternative for Acro Fields in itext7?
public byte[] GeneratePDF(string pdfPath, Dictionary<string, string> formFieldMap, bool formFlattening = true)
{
var output = new MemoryStream();
var reader = new PdfReader(pdfPath);
var stamper = new PdfStamper(reader, output);
//PdfDocument pdfDocument = new PdfDocument(reader, writer);
var formFields = stamper.AcroFields;
foreach (var fieldName in formFieldMap.Keys)
formFields.SetField(fieldName, formFieldMap[fieldName]);
stamper.FormFlattening = formFlattening;
stamper.Close();
reader.Close();
return output.ToArray();
}

The iText API got completely overhauled between versions 5.x and 7.x. Thus, you do not always have a one-to-one correspondence between classes here and there. Thus, I would propose studying the introductory ebooks on the iText knowledge base site before porting code.
There actually is an example in those ebooks very similar to your code:
//Initialize PDF document
PdfDocument pdf = new PdfDocument(new PdfReader(src), new PdfWriter(dest));
PdfAcroForm form = PdfAcroForm.GetAcroForm(pdf, true);
IDictionary<String, PdfFormField> fields = form.GetFormFields();
PdfFormField toSet;
fields.TryGetValue("name", out toSet);
toSet.SetValue("James Bond");
fields.TryGetValue("language", out toSet);
toSet.SetValue("English");
fields.TryGetValue("experience1", out toSet);
toSet.SetValue("Off");
fields.TryGetValue("experience2", out toSet);
toSet.SetValue("Yes");
fields.TryGetValue("experience3", out toSet);
toSet.SetValue("Yes");
fields.TryGetValue("shift", out toSet);
toSet.SetValue("Any");
fields.TryGetValue("info", out toSet);
toSet.SetValue("I was 38 years old when I became an MI6 agent.");
form.FlattenFields();
pdf.Close();
("Flattening a Form" in "Chapter 4: Making a PDF interactive | .NET" of "iText 7: Jump-Start Tutorial for .NET")

Related

Merge pdfs with NReco PdfGenerator

In the features section on NReco's site, in the examples list: there is a line about MergePdf.
I have looked in the API-reference and using the intellisense in visualstudio but I can't find anything.
I wan't to merge several pdf's before I sent them in a mail. The Pdf's is generated with nreco wkhtmltopdf with different headers and footers which I could not get to work in the same generate so I splitted the generation and now I want to merge the pdf's again.
Or do I have to get yet another library involved.
Just sharing what I ended up with. At least for now.
It is a modification of the suggested solution with iTextSharp.
public static byte[] MergePdfs(IEnumerable<byte[]> pdfs)
{
using (var memoryStream = new MemoryStream())
{
var document = new Document(PageSize.A4);
var writer = PdfWriter.GetInstance(document, memoryStream);
document.Open();
var writerDirectContent = writer.DirectContent;
foreach (var pdf in pdfs)
{
var pdfReader = new PdfReader(pdf);
var numberOfPages = pdfReader.NumberOfPages;
for (var currentPageNumber = 1; currentPageNumber <= numberOfPages; currentPageNumber++)
{
document.SetPageSize(PageSize.A4);
document.NewPage();
var page = writer.GetImportedPage(pdfReader, currentPageNumber);
writerDirectContent.AddTemplate(page, 1f, 0, 0, 1f, 0, 0);
}
}
document.Close();
return memoryStream.ToArray();
}
}
There are 2 ways how you can achieve the goal you mentioned:
use GeneratePdfFromFiles method which accepts array of WkHtmlInput structures that allow you to specify header/footer separately for each input HTML file. As result one PDF is produced; note that this method requires a valid license key and is not available for a free library users
generate several PDFs in a standard way, and then merge them into one resulting PDF with help of iTextSharp library (free LGPL 4.1.6 can be used for this purpose).

Reading PDF Bookmarks in VB.NET using iTextSharp

I am making a tool that scans PDF files and searches for text in PDF bookmarks and body text. I am using Visual Studio 2008 with VB.NET with iTextSharp.
How do I load bookmarks' list from an existing PDF file?
It depends on what you understand when you say "bookmarks".
You want the outlines (the entries that are visible in the bookmarks panel):
The CreateOnlineTree examples shows you how to use the SimpleBookmark class to create an XML file containing the complete outline tree (in PDF jargon, bookmarks are called outlines).
Java:
PdfReader reader = new PdfReader(src);
List<HashMap<String, Object>> list = SimpleBookmark.getBookmark(reader);
SimpleBookmark.exportToXML(list,
new FileOutputStream(dest), "ISO8859-1", true);
reader.close();
C#:
PdfReader reader = new PdfReader(pdfIn);
var list = SimpleBookmark.GetBookmark(reader);
using (MemoryStream ms = new MemoryStream()) {
SimpleBookmark.ExportToXML(list, ms, "ISO8859-1", true);
ms.Position = 0;
using (StreamReader sr = new StreamReader(ms)) {
return sr.ReadToEnd();
}
}
The list object can also be used to examine the different bookmark elements one by one programmatically (this is all explained in the official documentation).
You want the named destinations (specific places in the document you can link to by name):
Now suppose that you meant to say named destinations, then you need the SimpleNamedDestination class as shown in the LinkActions example:
Java:
PdfReader reader = new PdfReader(src);
HashMap<String,String> map = SimpleNamedDestination.getNamedDestination(reader, false);
SimpleNamedDestination.exportToXML(map, new FileOutputStream(dest),
"ISO8859-1", true);
reader.close();
C#:
PdfReader reader = new PdfReader(src);
Dictionary<string,string> map = SimpleNamedDestination
.GetNamedDestination(reader, false);
using (MemoryStream ms = new MemoryStream()) {
SimpleNamedDestination.ExportToXML(map, ms, "ISO8859-1", true);
ms.Position = 0;
using (StreamReader sr = new StreamReader(ms)) {
return sr.ReadToEnd();
}
}
The map object can also be used to examine the different named destinations one by one programmatically. Note the Boolean parameter that is used when retrieving the named destinations. Named destinations can be stored using a PDF name object as name, or using a PDF string object. The Boolean parameter indicates whether you want the former (true = stored as PDF name objects) or the latter (false = stored as PDF string objects) type of named destinations.
Named destinations are predefined targets in a PDF file that can be found through their name. Although the official name is named destinations, some people refer to them as bookmarks too (but when we say bookmarks in the context of PDF, we usually want to refer to outlines).
If someone is still searching the vb.net solution, trying to simplify, I have a large amount of pdf created with reportbuilder and with documentmap I automatically add a bookmarks "Title". So with iTextSharp I read the pdf and extract just the first bookmark value:
Dim oReader As New iTextSharp.text.pdf.PdfReader(PdfFileName)
Dim list As Object
list = SimpleBookmark.GetBookmark(oReader)
Dim string_book As String
string_book = list(0).item("Title")
It is a little help very simple for someone searching a start point to understand how it works.

Insert pages into PDF

Is there any function like Document.InsertPage(pageIndex)? Or any alternative solution?
What you are looking for is PdfStamper.insertPage(int, Rectangle).
See a full example of how to use it here, but in short, it should boil down to:
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
stamper.insertPage(pageIndex, reader.getPageSizeWithRotation(1));
// insert content via stamper.getUnderContent() or stamper.getOverContent()
stamper.close();
reader.close();
Note that this is Java code, but the C# counterpart can be deduced quite easily.

Existing PDF to PDF/A "conversion"

I am trying to make an existing pdf into pdf/a-1b. I understand that itext cannot convert a pdf to pdf/a in the sense making it pdf/a compliant. But it definitely can flag the document as pdf/a. However, I looked at various examples and I cannot seem to figure out how to do it. The major problem is that
writer.PDFXConformance = PdfWriter.PDFA1B;
does not work anymore. First PDFA1B is not recognized, second, pdfwriter seems to have been rewritten and there is not much information about that.
It seems the only (in itext java version) way is:
PdfAWriter writer = PdfAWriter.getInstance(document, new FileOutputStream(filename), PdfAConformanceLevel.PDF_A_1B);
But that requires a document type, ie. it can be used when creating a pdf from scratch.
Can someone give an example of pdf to pdf/a conversion with the current version of itextsharp?
Thank you.
I can't imagine a valid reason for doing this but apparently you have one.
The conformance settings in iText are intended to be used with a PdfWriter and that object is (generally) only intended to be used with new documents. Since iText was never intended to convert documents to conformance that's just the way it was built.
To do what you want to do you could either just open the original document and update the appropriate tags in the document's dictionary or you could create a new document with the appropriate entries set and then import your old document. The below code shows the latter route, it first creates a regular non-conforming PDF and then creates a second document that says it is conforming even though it may or may not. See the code comments for more details. This targets iTextSharp 5.4.2.0.
//Folder that we're working from
var workingFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
//Create a regular non-conformant PDF, nothing special below
var RegularPdf = Path.Combine(workingFolder, "File1.pdf");
using (var fs = new FileStream(RegularPdf, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (var doc = new Document()) {
using (var writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
doc.Add(new Paragraph("Hello world!"));
doc.Close();
}
}
}
//Create our conformant document from the above file
var ConformantPdf = Path.Combine(workingFolder, "File2.pdf");
using (var fs = new FileStream(ConformantPdf, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (var doc = new Document()) {
//Use PdfSmartCopy to get every page
using (var copy = new PdfSmartCopy(doc, fs)) {
//Set our conformance levels
copy.SetPdfVersion(PdfWriter.PDF_VERSION_1_3);
copy.PDFXConformance = PdfWriter.PDFX1A2001;
//Open our new document for writing
doc.Open();
//Bring in every page from the old PDF
using (var r = new PdfReader(RegularPdf)) {
for (var i = 1; i <= r.NumberOfPages; i++) {
copy.AddPage(copy.GetImportedPage(r, i));
}
}
//Close up
doc.Close();
}
}
}
Just to be 100% clear, this WILL NOT MAKE A CONFORMANT PDF, just a document that says it conforms.

Lucene HTMLFormatter skipping last character

I have this simple Lucene search code (Modified from http://www.lucenetutorial.com/lucene-in-5-minutes.html)
class Program
{
static void Main(string[] args)
{
StandardAnalyzer analyzer = new StandardAnalyzer();
Directory index = new RAMDirectory();
IndexWriter w = new IndexWriter(index, analyzer, true,
IndexWriter.MaxFieldLength.UNLIMITED);
addDoc(w, "Table 1 <table> content </table>");
addDoc(w, "Table 2");
addDoc(w, "<table> content </table>");
addDoc(w, "The Art of Computer Science");
w.Close();
String querystr = "table";
Query q = new QueryParser("title", analyzer).Parse(querystr);
Lucene.Net.Search.IndexSearcher searcher = new
Lucene.Net.Search.IndexSearcher(index);
Hits hitsFound = searcher.Search(q);
SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("*", "*");
Highlighter highlighter = null;
highlighter = new Highlighter(formatter, new QueryScorer(searcher.Rewrite(q)));
for (int i = 0; i < hitsFound.Length(); i++)
{
Console.WriteLine(highlighter.GetBestFragment(analyzer, "title", hitsFound.Doc(i).Get("title")));
// Console.WriteLine(hitsFound.Doc(i).Get("title"));
}
Console.ReadKey();
}
private static void addDoc(IndexWriter w, String value)
{
Document doc = new Document();
doc.Add(new Field("title", value, Field.Store.YES, Field.Index.ANALYZED));
w.AddDocument(doc);
}
}
The highlighted results always seem to skip the closing '>' of my last table tag. Any suggestions?
Lucene's highlighter, out of the box, is geared to handle plain text. It will work incorrectly if you try to highlight HTML or any mark-up text.
I recently ran into the same problem and found a solution in Solr's HTMLStripReader which skips the content in tags. The solution is outlined on my blog at following URL.
http://sigabrt.blogspot.com/2010/04/highlighting-query-in-entire-html.html
I could have posted the code here, but my solution is applicable for Lucene Java. For .Net, you have to find out equivalent of HTMLStripReader.
Solved. Apparently my Highlighter.Net version was archaic. Upgrading to 2.3.2.1 Solved the problem