ITextSharp throws an error when you attempt to create a PdfTable with 0 columns.
I have a requirement to take XHTML that is generated using an XSLT transformation and generate a PDF from it. Currently I am using ITextSharp to do so. The problem that I am having is the XHTML that is generated sometimes contains tables with 0 rows, so when ITextSharp attempts to parse them into a table it throws and error saying there are 0 columns in the table.
The reason it says 0 columns is because ITextSharp sets the number of columns in the table to the maximum of the number of columns in each row, and since there are no rows the max number of columns in any given row is 0.
How do I go about catching these HTML table declarations with 0 rows and stop them from being parsed into PDF elements?
I've found the piece of code that is causing the error is within the HtmlPipeline, so I could copy and paste the implementation into a class extending HtmlPipeline and overriding its methods and then do my logic to check for empty tables there, but that seems sloppy and inefficient.
Is there a way to catch the empty table before it is parsed?
=Solution=
The Tag Processor
public class EmptyTableTagProcessor : Table
{
public override IList<IElement> End(IWorkerContext ctx, Tag tag, IList<IElement> currentContent)
{
if (currentContent.Count > 0)
{
return base.End(ctx, tag, currentContent);
}
return new List<IElement>();
}
}
And using the Tag Processor...
//CSS
var cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(true);
//HTML
var fontProvider = new XMLWorkerFontProvider();
var cssAppliers = new CssAppliersImpl(fontProvider);
var tagProcessorFactory = Tags.GetHtmlTagProcessorFactory();
tagProcessorFactory.AddProcessor(new EmptyTableTagProcessor(), new string[] { "table" });
var htmlContext = new HtmlPipelineContext(cssAppliers);
htmlContext.SetTagFactory(tagProcessorFactory);
//PIPELINE
var pipeline =
new CssResolverPipeline(cssResolver,
new HtmlPipeline(htmlContext,
new PdfWriterPipeline(document, pdfWriter)));
//XML WORKER
var xmlWorker = new XMLWorker(pipeline, true);
using (var stringReader = new StringReader(html))
{
xmlParser.Parse(stringReader);
}
This solution removes the empty table tags and still writes the PDF as a part of the pipeline.
You should be able to write your own tag processor that accounts for that scenario by subclassing iTextSharp.tool.xml.html.AbstractTagProcessor. In fact, to make your life even easier you can subclass the already existing more specific iTextSharp.tool.xml.html.table.Table:
public class TableTagProcessor : iTextSharp.tool.xml.html.table.Table {
public override IList<IElement> End(IWorkerContext ctx, Tag tag, IList<IElement> currentContent) {
//See if we've got anything to work with
if (currentContent.Count > 0) {
//If so, let our parent class worry about it
return base.End(ctx, tag, currentContent);
}
//Otherwise return an empty list which should make everyone happy
return new List<IElement>();
}
}
Unfortunately, if you want to use a custom tag processor you can't use the shortcut XMLWorkerHelper class and instead you'll need to parse the HTML into elements and add them to your document. To do that you'll need an instance of iTextSharp.tool.xml.IElementHandler which you can create like:
public class SampleHandler : iTextSharp.tool.xml.IElementHandler {
//Generic list of elements
public List<IElement> elements = new List<IElement>();
//Add the supplied item to the list
public void Add(IWritable w) {
if (w is WritableElement) {
elements.AddRange(((WritableElement)w).Elements());
}
}
}
You can use the above with the following code which includes some sample invalid HTML.
//Hold everything in memory
using (var ms = new MemoryStream()) {
//Create new PDF document
using (var doc = new Document()) {
using (var writer = PdfWriter.GetInstance(doc, ms)) {
doc.Open();
//Sample HTML
string html = "<table><tr><td>Hello</td></tr></table><table></table>";
//Create an instance of our element helper
var XhtmlHelper = new SampleHandler();
//Begin pipeline
var htmlContext = new HtmlPipelineContext(null);
//Get the default tag processor
var tagFactory = iTextSharp.tool.xml.html.Tags.GetHtmlTagProcessorFactory();
//Add an instance of our new processor
tagFactory.AddProcessor(new TableTagProcessor(), new string[] { "table" });
//Bind the above to the HTML context part of the pipeline
htmlContext.SetTagFactory(tagFactory);
//Get the default CSS handler and create some boilerplate pipeline stuff
var cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(false);
var pipeline = new CssResolverPipeline(cssResolver, new HtmlPipeline(htmlContext, new ElementHandlerPipeline(XhtmlHelper, null)));//Here's where we add our IElementHandler
//The worker dispatches commands to the pipeline stuff above
var worker = new XMLWorker(pipeline, true);
//Create a parser with the worker listed as the dispatcher
var parser = new XMLParser();
parser.AddListener(worker);
//Finally, parse our HTML directly.
using (TextReader sr = new StringReader(html)) {
parser.Parse(sr);
}
//The above did not touch our document. Instead, all "proper" elements are stored in our helper class XhtmlHelper
foreach (var element in XhtmlHelper.elements) {
//Add these to the main document
doc.Add(element);
}
doc.Close();
}
}
}
Related
Im working on a webApi using dotnet core that takes the excel file from IFormFile and reads its content.Iam following the article
https://levelup.gitconnected.com/reading-an-excel-file-using-an-asp-net-core-mvc-application-2693545577db which is doing the same thing except that the file here is present on the server and mine will be provided by user.
here is the code:
public IActionResult Test(IFormFile file)
{
List<UserModel> users = new List<UserModel>();
System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);
using (var stream = System.IO.File.Open(file.FileName, FileMode.Open, FileAccess.Read))
{
using (var reader = ExcelReaderFactory.CreateReader(stream))
{
while (reader.Read()) //Each row of the file
{
users.Add(new UserModel
{
Name = reader.GetValue(0).ToString(),
Email = reader.GetValue(1).ToString(),
Phone = reader.GetValue(2).ToString()
});
}
}
}
return Ok(users);
}
}
When system.IO tries to open the file, it could not find the path as the path is not present. How it is possible to either get the file path (that would vary based on user selection of file)? are there any other ways to make it possible.
PS: I dont want to upload the file on the server first, then read it.
You're using the file.FileName property, which refers to the file name the browser send. It's good to know, but not a real file on the server yet. You have to use the CopyTo(Stream) Method to access the data:
public IActionResult Test(IFormFile file)
{
List<UserModel> users = new List<UserModel>();
System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);
using (var stream = new MemoryStream())
{
file.CopyTo(stream);
stream.Position = 0;
using (var reader = ExcelReaderFactory.CreateReader(stream))
{
while (reader.Read()) //Each row of the file
{
users.Add(new UserModel{Name = reader.GetValue(0).ToString(), Email = reader.GetValue(1).ToString(), Phone = reader.GetValue(2).ToString()});
}
}
}
return Ok(users);
}
Reference
I want to add a functionality of adding a watermark using itextSharp library to the pdf document that is being added to the library. For this I created an event listener that is triggered when item is being added. The code is as follows :
using System;
using System.Security.Permissions;
using Microsoft.SharePoint;
using Microsoft.SharePoint.Utilities;
using Microsoft.SharePoint.Workflow;
using iTextSharp.text;
using iTextSharp.text.pdf;
using System.IO;
namespace ProjectPrac.WaterMarkOnUpload
{
/// <summary>
/// List Item Events
/// </summary>
public class WaterMarkOnUpload : SPItemEventReceiver
{
/// <summary>
/// An item is being added.
/// </summary>
public override void ItemAdding(SPItemEventProperties properties)
{
base.ItemAdding(properties);
string watermarkedFile = "Watermarked.pdf";
// Creating watermark on a separate layer
// Creating iTextSharp.text.pdf.PdfReader object to read the Existing PDF Document
PdfReader reader1 = new PdfReader("C:\\Users\\Desktop\\Hello.pdf"); //THE RELATIVE PATH
using (FileStream fs = new FileStream(watermarkedFile, FileMode.Create, FileAccess.Write, FileShare.None))
// Creating iTextSharp.text.pdf.PdfStamper object to write Data from iTextSharp.text.pdf.PdfReader object to FileStream object
using (PdfStamper stamper = new PdfStamper(reader1, fs))
{
// Getting total number of pages of the Existing Document
int pageCount = reader1.NumberOfPages;
// Create New Layer for Watermark
PdfLayer layer = new PdfLayer("WatermarkLayer", stamper.Writer);
// Loop through each Page
for (int i = 1; i <= pageCount; i++)
{
// Getting the Page Size
Rectangle rect = reader1.GetPageSize(i);
// Get the ContentByte object
PdfContentByte cb = stamper.GetUnderContent(i);
// Tell the cb that the next commands should be "bound" to this new layer
cb.BeginLayer(layer);
cb.SetFontAndSize(BaseFont.CreateFont(
BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.NOT_EMBEDDED), 50);
PdfGState gState = new PdfGState();
gState.FillOpacity = 0.25f;
cb.SetGState(gState);
cb.SetColorFill(BaseColor.BLACK);
cb.BeginText();
cb.ShowTextAligned(PdfContentByte.ALIGN_CENTER, "Confidential", rect.Width / 2, rect.Height / 2, 45f);
cb.EndText();
// Close the layer
cb.EndLayer();
}
}
}
I want to know how to add the path without hardcoding it here :
PdfReader reader1 = new PdfReader("C:\\Users\\Desktop\\Hello.pdf"); //THE RELATIVE PATH
And then uploading the watermarked document to the library and not the original pdf.
I know that it can also be done through workflow but I am pretty new to sharepoint. So if at all you have an answer that has workflow in it please give the link that explains the workflow for automating the pdf watermarking.
You don't need to have workflow to achieve what you are looking for:
First, use ItemAdded event instead of ItemAdding. Then you can access SPFile associated with updated list item.
public override void ItemAdded(SPItemEventProperties properties)
{
var password = string.Empty; //or you put some password handling
SPListItem listItemToFile = properties.Listitem;
SPFile pdfOriginalFile = listItemToFile.File;
//get byte[] of uploaded file
byte[] contentPdfOriginalFile = pdfOriginalFile.OpenBinary();
//create reader from byte[]
var pdfReader = new PdfReader(new RandomAccessFileOrArray(contentPdfOriginalFile), password);
using (var ms = new MemoryStream()) {
using (var stamper = new PdfStamper(pdfReader, ms, '\0', true)) {
// do your watermarking stuff
...
// resuming SP stuff
}
var watermarkedPdfContent = ms.ToArray();
base.EventFiringEnabled = false; //to prevent other events being fired
var folder = pdfOriginalFile.ParentFolder;//you want to upload to the same place
folder.Files.Add(contentPdfOriginalFile.Name, fs.ToArray(),true);
base.EventFiringEnabled = true;
}
}
I probably did a typo or two since I didn't run this code. However, it should give you an idea.
I have written some code that merges together multiple PDF's into a single PDF that I then display from the MemoryStream. This works great. What I need to do is add a table of contents to the end of the file with links to the start of each of the individual PDF's. I planned on doing this using the GotoLocalPage action which has an option for page numbers but it doesn't seem to work. If I change the action to the code below to one of the presset ones like PDFAction.FIRSTPAGE it works fine. Does this not work because I am using the PDFCopy object for the writer parameter of GotoLocalPage?
Document mergedDoc = new Document();
MemoryStream ms = new MemoryStream();
PdfCopy copy = new PdfCopy(mergedDoc, ms);
mergedDoc.Open();
MemoryStream tocMS = new MemoryStream();
Document tocDoc = null;
PdfWriter tocWriter = null;
for (int i = 0; i < filesToMerge.Length; i++)
{
string filename = filesToMerge[i];
PdfReader reader = new PdfReader(filename);
copy.AddDocument(reader);
// Initialise TOC document based off first file
if (i == 0)
{
tocDoc = new Document(reader.GetPageSizeWithRotation(1));
tocWriter = PdfWriter.GetInstance(tocDoc, tocMS);
tocDoc.Open();
}
// Create link for TOC, added random number of 3 for now
Chunk link = new Chunk(filename);
PdfAction action = PdfAction.GotoLocalPage(3, new PdfDestination(PdfDestination.FIT), copy);
link.SetAction(action);
tocDoc.Add(new Paragraph(link));
}
// Add TOC to end of merged PDF
tocDoc.Close();
PdfReader tocReader = new PdfReader(tocMS.ToArray());
copy.AddDocument(tocReader);
copy.Close();
displayPDF(ms.ToArray());
I guess an alternative would be to link to a named element (instead of page number) but I can't see how to add an 'invisible' element to the start of each file before adding to the merged document?
I would just go with two passes. In your first pass, do the merge as you are but also record the filename and page number it should link to. In your second pass, use a PdfStamper which will give you access to a ColumnText that you can use general abstractions like Paragraph in. Below is a sample that shows this off:
Since I don't have your documents, the below code creates 10 documents with a random number of pages each just for testing purposes. (You obviously don't need to do this part.) It also creates a simple dictionary with a fake file name as the key and the raw bytes from the PDF as a value. You have a true file collection to work with but you should be able to adapt that part.
//Create a bunch of files, nothing special here
//files will be a dictionary of names and the raw PDF bytes
Dictionary<string, byte[]> Files = new Dictionary<string, byte[]>();
var r = new Random();
for (var i = 1; i <= 10; i++) {
using (var ms = new MemoryStream()) {
using (var doc = new Document()) {
using (var writer = PdfWriter.GetInstance(doc, ms)) {
doc.Open();
//Create a random number of pages
for (var j = 1; j <= r.Next(1, 5); j++) {
doc.NewPage();
doc.Add(new Paragraph(String.Format("Hello from document {0} page {1}", i, j)));
}
doc.Close();
}
}
Files.Add("File " + i.ToString(), ms.ToArray());
}
}
This next block merges the PDFs. This is mostly the same as your code except that instead of writing a TOC here I'm just keeping track of what I want to write in the future. Where I'm using file.value you'd use your full file path and where I'm using file.key you'd use your file's name instead.
//Dictionary of file names (for display purposes) and their page numbers
var pages = new Dictionary<string, int>();
//PDFs start at page 1
var lastPageNumber = 1;
//Will hold the final merged PDF bytes
byte[] mergedBytes;
//Most everything else below is standard
using (var ms = new MemoryStream()) {
using (var document = new Document()) {
using (var writer = new PdfCopy(document, ms)) {
document.Open();
foreach (var file in Files) {
//Add the current page at the previous page number
pages.Add(file.Key, lastPageNumber);
using (var reader = new PdfReader(file.Value)) {
writer.AddDocument(reader);
//Increment our current page index
lastPageNumber += reader.NumberOfPages;
}
}
}
}
mergedBytes = ms.ToArray();
}
This last block actually writes the TOC. If we use a PdfStamper we can create a ColumnText which allows us to use Paragraphs
//Will hold the final PDF
byte[] finalBytes;
using (var ms = new MemoryStream()) {
using (var reader = new PdfReader(mergedBytes)) {
using (var stamper = new PdfStamper(reader, ms)) {
//The page number to insert our TOC into
var tocPageNum = reader.NumberOfPages + 1;
//Arbitrarily pick one page to use as the size of the PDF
//Additional logic could be added or this could just be set to something like PageSize.LETTER
var tocPageSize = reader.GetPageSize(1);
//Arbitrary margin for the page
var tocMargin = 20;
//Create our new page
stamper.InsertPage(tocPageNum, tocPageSize);
//Create a ColumnText object so that we can use abstractions like Paragraph
var ct = new ColumnText(stamper.GetOverContent(tocPageNum));
//Set the working area
ct.SetSimpleColumn(tocPageSize.GetLeft(tocMargin), tocPageSize.GetBottom(tocMargin), tocPageSize.GetRight(tocMargin), tocPageSize.GetTop(tocMargin));
//Loop through each page
foreach (var page in pages) {
var link = new Chunk(page.Key);
var action = PdfAction.GotoLocalPage(page.Value, new PdfDestination(PdfDestination.FIT), stamper.Writer);
link.SetAction(action);
ct.AddElement(new Paragraph(link));
}
ct.Go();
}
}
finalBytes = ms.ToArray();
}
I am using the PdfPageEventHelper in order to be able to add Header and Footer to each page of my document automatically. As was mentioned in many places I am doing so while overriding the OnEndPage.
On my class I am creating:
PdfDocument
creating a FileStream
getting a PdfWriter vis the the static GetInstance method
setting the specific PdfPageEventHelper class that I've created to the writer.PageEvent
adding the writer to the document
calling the document open
adding some content to the document (one very small table having one row)
calling the close document
Now - at step #8 the OnEndPage is being called, which is great, but somehow it is being called twice!
both times it is called to page number 1 (as I see on runtime in the document parameter)
therefore I am getting 2 pages in my document instead of one, where the second page is empty, and the first page is actually having my header and footer twice (overlapping).
I am using iTextSharp version 5.5.1.0 , and I saw on the source file that in Document.Close method they are calling NewPage function... this is why I am getting to OnEndPage in the 2nd time.
Any suggestions?
class MyPdfWriter
{
public MyPdfWriter()
{
//generate doc, file stream etc.
_document = _document = new PdfDocument();
_document.SetMargins(15, 15, 50, 50);
_document.SetPageSize(PageSize.A4);
_fs = new FileStream("myTest.pdf", FileMode.Create);
_writer = PdfWriter.GetInstance(_document, _fs);
_writer.PageEmpty = false;
_writer.PageEvent = new PdfPage(reportDetails.Header,reportDetails.Footer,reportDetails.LogoImage, reportDetails.ReportFileName);
//open doc
_document.Open();
//add some content
var table = new PdfPTable(1);
table.AddCell("bla bla");
_document.Add(titleTable);
//close doc, stream etc.
if (_document != null && _document.IsOpen())
{
_document.Close();
_document = null;
}
if (_writer != null)
{
_writer.Close();
_writer = null;
}
if (_fs != null)
{
_fs.Close();
_fs.Dispose();
}
}
}
class PdfPage : PdfPageEventHelper
{
public override void OnEndPage(PdfWriter writer, Document document)
{
var footer = new PdfPTable(2);
//Write some cells to footer
//....
//init the width
footer.TotalWidth = (footer.TotalWidth.CompareTo(0f) != 0) ? footer.TotalWidth : document.PageSize.Width;
//write the table with WriteSelectedRows
footer.WriteSelectedRows(0, -1, document.LeftMargin, footer.TotalHeight + 10f, writer.DirectContent);
var Header = new PdfPTable(2);
//Write some cells to Header
//....
//init the width
Header.TotalWidth = (Header.TotalWidth.CompareTo(0f) != 0) ? Header.TotalWidth : document.PageSize.Width;
//write the table with WriteSelectedRows
Header.WriteSelectedRows(0, -1, document.LeftMargin, document.PageSize.Height - 10,
writer.DirectContent);
}
}
I currently have the following class that I'm trying to add a Hashtable of metadata properties to a PDF. The problem is, even though it appears to assign the hashtable to the stamper.MoreInfo property it doesn't appear to save the MoreInfo property once the stamper is closed.
public class PdfEnricher
{
readonly IFileSystem fileSystem;
public PdfEnricher(IFileSystem fileSystem)
{
this.fileSystem = fileSystem;
}
public void Enrich(string pdfFile, Hashtable fields)
{
if (!fileSystem.FileExists(pdfFile)) return;
var newFile = GetNewFileName(pdfFile);
var stamper = GetStamper(pdfFile, newFile);
SetFieldsAndClose(stamper, fields);
}
string GetNewFileName(string pdfFile)
{
return fileSystem.GetDirectoryName(pdfFile) + #"\NewFileName.pdf";
}
static void SetFieldsAndClose(PdfStamper stamper, Hashtable fields)
{
stamper.MoreInfo = fields;
stamper.FormFlattening = true;
stamper.Close();
}
static PdfStamper GetStamper(string pdfFile, string newFile)
{
var reader = new PdfReader(pdfFile);
return new PdfStamper(reader, new FileStream(newFile, FileMode.Create));
}
}
Any ideas?
As always, Use The Source.
In this case, I saw a possibility fairly quickly (Java source btw):
public void close() throws DocumentException, IOException {
if (!hasSignature) {
stamper.close( moreInfo );
return;
}
Does this form already have signatures of some sort? Lets see when hasSignatures would be true.
That can't be the case with your source. hasSignatures is only set when you sign a PDF via PdfStamper.createSignature(...), so that's clearly not it.
Err... how are you checking that your MoreInfo was added? It won't be in the XMP metadata. MoreInfo is added directly to the Doc Info dictionary. You see them in the "Custom" tab of Acrobat (and most likely Reader, though I don't have it handy at the moment).
Are you absolutely sure MoreInfo isn't null, and all its values aren't null?
The Dictionary is just passed around by reference, so any changes (in another thread) would be reflected in the PDF as it was written.
The correct way to iterate through a document's "Doc info dictionary":
PdfReader reader = new PdfReader(somePath);
Map<String, String> info = reader.getInfo();
for (String key : info.keySet()) {
System.out.println( key + ": " + info.get(key) );
}
Note that this will go through all the fields in the document info dictionary, not just the custom ones. Also be aware that changes made the the Map from getInfo() will not carry over to the PDF. The map is new'ed, populated, and returned.