I'm looking to serialise individual PowerPoint slides and persist them on disk. The goal is to access the slide data later on and add them to other presentations. For our add-in, the number of slides to be persisted could grow fast and we'd rather not store them in PowerPoint files for performance reasons.
I've tried to persist the data in the clipboard for office 2010-2016 by getting the data format "PowerPoint 14.0 Slides Package" from the clipboard after copying a slide. I then read the files, set them back to the clipboard and paste them in the Slides collection of the target presentation.
But I'm not able to make this work for Office 2007 - there's no "slides package" format in the memory. The only clipboard clip that looks like it could store the slide data is "Embedded Object" but that doesn't seem to work as I'm not able to paste the slide again.
Is there a way to solve this? And am I missing a more elegant solution to the problem?
Here is my current code in C#:
public static string PptSlideClipFormat = "PowerPoint 14.0 Slides Package";
public static void serializeSlide(PowerPoint.Slide slide, string filePath)
{
slide.Copy();
var dataObject = Clipboard.GetDataObject();
MemoryStream myStream = (MemoryStream)Clipboard.GetData(PptSlideClipFormat);
IsolatedStorageFileStream fs = new IsolatedStorageFileStream(filePath, FileMode.Create, Utils.Common.isoStore);
try
{
myStream.WriteTo(fs);
}
catch
{
throw new Exception("Failed to serialize slide clipboard data");
}
finally
{
fs.Close();
}
Utils.Common.clearClipboard();
}
[…]
public static void deserializeSlide(string filePath)
{
IsolatedStorageFileStream fs = new IsolatedStorageFileStream(filePath, FileMode.OpenOrCreate, Utils.Common.isoStore);
try
{
Clipboard.SetData(PptSlideClipFormat, fs);
}
catch
{
throw new Exception("Failed to serialize slide clipboard data");
}
finally
{
fs.Close();
}
}
Related
I have a method that uses Microsoft.Office.Interop to read a Word document, edit it, and write it as a PDF. I have a separate method that uses Itext7 to read this PDF and write it to a different PDF that can be viewed and printed, but cannot be easily altered.
The first method reads the word document from disk; however, I've been asked to make it read the document from an sql query from a variable stored as varbinary and write the final result as a varbinary -without using any intermediate files on disk. I think I need to read these as "streams"
Here's what I have:
class clsMakeCert
{
string myName;
public string myDate;
public clsMakeCert(string name, DateTime date)
{
myName = name;
myDate = date.ToString("MM/dd/yyyy");
}
public void createCertPdf(string certFilename)
{
// Get the document out of the SQL table
System.Data.DataTable dtContents = new System.Data.DataTable();
SqlDataReader rdr_Contents = null;
using (SqlConnection conn = new SqlConnection("Server=KGREEN3-LT\\SQLEXPRESS;Initial Catalog=SAN;Integrated Security=SSPI"))
{
conn.Open();
SqlCommand cmd = new SqlCommand("Select [file_data] From cert_files where filename=#certFilename", conn);
{
cmd.CommandType = CommandType.Text;
cmd.Parameters.AddWithValue("#certFilename", certFilename);
rdr_Contents = cmd.ExecuteReader(CommandBehavior.CloseConnection);
dtContents.Load(rdr_Contents);
}
}
byte[] byteArray = (byte[])dtContents.Rows[0]["file_data"];
// Put it into a word document
Application wordApp = new Application { Visible = false };
Document doc = new Document();
using (MemoryStream stream = new MemoryStream())
{
stream.Write(byteArray, 0, (int)byteArray.Length);
doc = wordApp.Documents.Open(stream, ReadOnly: false, Visible: false);
doc.Activate();
FindAndReplace(wordApp, "First Middle Last", myName);
FindAndReplace(wordApp, "11/11/1111", myDate);
}
return ;
}
}
It doesn't look like the open method will accept a stream, though.
It looks like I can use openXML to read / edit the docx as a stream. But it won't convert to pdf.
It looks like I can use Interop to read /edit the docx AND write to PDF, but I can't do it as a stream (only as a file on disk).
Is there a way to get Interop to read a stream (i.e. a file loaded from a varbinary)?
k
No, the Word "Interop" does not support streaming. The closest it comes (and it's the only Office application that has this capability) is to pass in the necessary Word Open XML in the OPC flat file format using the obejct model's InsertXML method. It's not guaranteed, however, that the result will be an exact duplicate of the Word Open XML file being passed in as the target document's settings could override some of the in-coming settings.
I want to print a node to a pdf file using "Microsoft Print to PDF" printer. Supposing that the Printer object is already extracted I have the next function which is working perfectly.
public static void printToPDF(Printer printer, Node node) {
PrinterJob job = PrinterJob.createPrinterJob(printer);
if (job != null) {
job.getJobSettings().setPrintQuality(PrintQuality.HIGH);
PageLayout pageLayout = job.getPrinter().createPageLayout(Paper.A4, PageOrientation.PORTRAIT,
Printer.MarginType.HARDWARE_MINIMUM);
boolean printed = job.printPage(pageLayout, node);
if (printed) {
job.endJob();
} else {
System.out.println("Printing failed.");
}
} else {
System.out.println("Could not create a printer job.");
}
}
The only issue that I have here, is that a dialog box is popping up and asking for a destination path to save the pdf. I was struggling to find a solution to set the path programmatically, but with no success. Any suggestions? Thank you in advance.
After some more research I came with an ugly hack. I accessed jobImpl private field from PrinterJob, and I took attributes out of it. Therefore I inserted the destination attribute, and apparently it is working as requested. I know it is not nice, but ... is kind of workable. If you have any nicer suggestion, please do not hesitate to post them.
try {
java.lang.reflect.Field field = job.getClass().getDeclaredField("jobImpl");
field.setAccessible(true);
PrinterJobImpl jobImpl = (PrinterJobImpl) field.get(job);
field.setAccessible(false);
field = jobImpl.getClass().getDeclaredField("printReqAttrSet");
field.setAccessible(true);
PrintRequestAttributeSet printReqAttrSet = (PrintRequestAttributeSet) field.get(jobImpl);
field.setAccessible(false);
printReqAttrSet.add(new Destination(new java.net.URI("file:/C:/deleteMe/wtv.pdf")));
} catch (Exception e) {
System.err.println(e);
}
OLD TITLE: iTextSharp convert HTML to PDF "The document has no pages."
I am using iTextSharp and xmlworker to convert html from a view to PDF in ASP.NET Core 2.1
I tried many code snippets I found online but all generate an exception:
The document has no pages.
Here is my current code:
public static byte[] ToPdf(string html)
{
byte[] output;
using (var document = new Document())
{
using (var workStream = new MemoryStream())
{
PdfWriter writer = PdfWriter.GetInstance(document, workStream);
writer.CloseStream = false;
document.Open();
using (var reader = new StringReader(html))
{
XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, reader);
document.Close();
output = workStream.ToArray();
}
}
}
return output;
}
UPDATE 1
Thanks to #Bruno Lowagie's advice, I upgraded to iText7 and pdfHTML, but I can't find much tutorials about it.
I tried this code:
public static byte[] ToPdf(string html)
{
html = "<html><head><title>Extremely Basic Title</title></head><body>Extremely Basic Content</body></html>";
byte[] output;
using (var workStream = new MemoryStream())
using (var pdfWriter = new PdfWriter(workStream))
{
using (var document = HtmlConverter.ConvertToDocument(html, pdfWriter))
{
//Passes the document to a delegated function to perform some content, margin or page size manipulation
//pdfModifier(document);
}
//Returns the written-to MemoryStream containing the PDF.
return workStream.ToArray();
}
}
but I get
System.NullReferenceException
when I call HtmlConverter.ConvertToDocument(html, pdfWriter)
Am I missing something?
UPDATE 2
I tried to debug using source code.
This is the stack trace
System.NullReferenceException
HResult=0x80004003
Message=Object reference not set to an instance of an object.
Source=itext.io
StackTrace: at iText.IO.Font.FontCache..cctor() in S:\Progetti\*****\itext7-dotnet-develop\itext\itext.io\itext\io\font\FontCache.cs:line 76
This is the code that generates the exception:
static FontCache()
{
try
{
LoadRegistry();
foreach (String font in registryNames.Get(FONTS_PROP))
{
allCidFonts.Put(font, ReadFontProperties(font));
}
}
catch (Exception) { }
}
registryNames count = 0 and .Get(FONTS_PROP) throws the exception
UPDATE 3
The problem was related to some sort of cache. I can't really understand what, but as you can see in the code the exception was generated when it tried to load fonts from cache.
I realized that, after having tried the same code on a new project where it worked.
So I cleaned the solution, deleted bin, obj, .vs, killed IIS Express, removed and reinstalled all nuget packages then run again, magically it worked.
Then I had to make only one fix to the code:
Instead of HtmlConverter.ConvertToDocument that generates only a 15 bytes document I used HtmlConverter.ConvertToPdf to generate a full PDF.
Here is the complete code:
public static byte[] ToPdf(string html)
{
using (var workStream = new MemoryStream())
{
using (var pdfWriter = new PdfWriter(workStream))
{
HtmlConverter.ConvertToPdf(html, pdfWriter);
return workStream.ToArray();
}
}
}
I had this EXACT same problem, and after digging down all the way to iText7's FontCache object and getting an error when trying to create my OWN FontProgram to use from a raw TTF file (which also failed with the same null reference error), I finally "solved" my problem.
Apparently iText has some internal errors/exceptions that they are just sort of "skipping" and "pushing past", because I realized by accident that I had "Enable Just My Code" in Visual Studios disabled, and so my system was trying to debug iText7's code as well as mine. The moment that I re-enabled it in my Visual Studio settings (Tools > Options > Debugging > General > Enable Just My Code checkbox), the problem magically went away.
So I spent four hours trying to troubleshoot a problem that was in THEIR code, but that they apparently found some way to work around and push through the method anyways even on a null reference failure.
My convert to PDF function is now working just fine.
I was getting this error as well, but noticed it was only on the first attempted load of the SvgConverter. So I added this at the top of my class, and it seems to have fixed hidden the bug.
using iText.Kernel.Pdf;
using iText.IO.Font;
public class PdfBuilder {
static PdfBuilder() {
try {
FontCache.GetRegistryNames();
}
catch(Exception) {
// ignored... this forces the FontCache to initialize
}
}
...
}
I was using itext 7 everything works fine in Console application.
When I use same code in Web/Function App project, I started getting below error.
System.NullReferenceException
HResult=0x80004003
Message=Object reference not set to an instance of an object.
Source=itext.html2pdf
StackTrace:
at iText.Html2pdf.Attach.Impl.Tags.BrTagWorker..ctor(IElementNode element, ProcessorContext context)
at iText.Html2pdf.Attach.Impl.DefaultTagWorkerMapping.<>c.<.cctor>b__1_10(IElementNode lhs, ProcessorContext rhs)
at iText.Html2pdf.Attach.Impl.DefaultHtmlProcessor.Visit(INode node)
at iText.Html2pdf.Attach.Impl.DefaultHtmlProcessor.Visit(INode node)
at iText.Html2pdf.Attach.Impl.DefaultHtmlProcessor.Visit(INode node)
at iText.Html2pdf.Attach.Impl.DefaultHtmlProcessor.Visit(INode node)
at iText.Html2pdf.Attach.Impl.DefaultHtmlProcessor.Visit(INode node)
at iText.Html2pdf.Attach.Impl.DefaultHtmlProcessor.Visit(INode node)
at iText.Html2pdf.Attach.Impl.DefaultHtmlProcessor.Visit(INode node)
at iText.Html2pdf.Attach.Impl.DefaultHtmlProcessor.ProcessDocument(INode root, PdfDocument pdfDocument)
at iText.Html2pdf.HtmlConverter.ConvertToPdf(String html, PdfDocument pdfDocument, ConverterProperties converterProperties)
at iTextSample.ConsoleApp.HtmlToPdfBuilder.RenderPdf() in C:\code\iTextSample.ConsoleApp\HtmlToPdfBuilder.cs:line 227
After some investigation found that <br /> tag was a problem. I removed all <br /> tags and it is working fine.
Please help me understand if my solution is correct.
I'm trying to extract text from a PDF file with a LocationTextExtractionStrategy parser. I'm getting exceptions because the ParseContentMethod tries to parse inline images? The code is simple and looks similar to this:
RenderFilter[] filter = { new RegionTextRenderFilter(cropBox) };
ITextExtractionStrategy strategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), filter);
PdfTextExtractor.GetTextFromPage(pdfReader, pageNumber, strategy);
I realize the images are in the content stream but I have a PDF file failing to extract text because of inline images. It returns an UnsupportedPdfException of "The filter /DCTDECODE is not supported" and then it finally fails with and InlineImageParseException of "Could not find image data or EI", when all I really care about is the text. The BI/EI exists in my file so I assume this failure is because of the /DCTDECODE exception. But again, I don't care about images, I'm looking for text.
My current solution for this is to add a filterHandler in the InlineImageUtils class that assigns the Filter_DoNothing() filter to the DCTDECODE filterHandler dictionary. This way I don't get exceptions when I have InlineImages with DCTDECODE. Like this:
private static bool InlineImageStreamBytesAreComplete(byte[] samples, PdfDictionary imageDictionary) {
try {
IDictionary<PdfName, FilterHandlers.IFilterHandler> handlers = new Dictionary<PdfName, FilterHandlers.IFilterHandler>(FilterHandlers.GetDefaultFilterHandlers());
handlers[PdfName.DCTDECODE] = new Filter_DoNothing();
PdfReader.DecodeBytes(samples, imageDictionary, handlers);
return true;
} catch (IOException e) {
return false;
}
}
public class Filter_DoNothing : FilterHandlers.IFilterHandler
{
public byte[] Decode(byte[] b, PdfName filterName, PdfObject decodeParams, PdfDictionary streamDictionary)
{
return b;
}
}
My problem with this "fix" is that I had to change the iTextSharp library. I'd rather not do that so I can try to stay compatible with future versions.
Here's the PDF in question:
https://app.box.com/s/7eaewzu4mnby9ogpl2frzjswgqxn9rz5
I have a W9 PDF document that I am filling with data for just a single record per button click. Now the client would like to create a single document where each record is a page in the document. Below is our code to create a PDF for each employee.
protected void lnkFillFields_Click(object sender, EventArgs e)
{
using (Library.Data.PDFData data = new Library.Data.PDFData())
{
try
{
Document document = new Document();
PdfCopy writer = new PdfCopy(document, Response.OutputStream);
document.Open();
foreach (EmployeeData emp in data.sp_select_employee_data())
{
//Creates a PDF from a byte array
PdfReader reader =
new PdfReader((Byte[])data.sp_select_doc(16).Tables[0].Rows[0]["doc"]);
//Creates a "stamper" object used to populate interactive fields
MemoryStream ms = new MemoryStream();
PdfStamper stamper = new PdfStamper(reader, ms);
try
{
//MUST HAVE HERE BEFORE STREAMING!!!
//This line populates the interactive fields with your data.
// false = Keeps the fields as editable
// true = Turns all of the editable fields to their read-only equivalent
stamper.FormFlattening = false;
//fill in PDF here
stamper.Close();
reader.Close();
MergePDFs(writer, ms);
}
catch (Exception ex)
{
throw ex;
}
}
document.Close();
//Stream the file to the user
Response.ContentType = "application/pdf";
Response.BufferOutput = true;
Response.AppendHeader("Content-Disposition", "attachment; filename=W9"+ "_Complete.pdf");
Response.Flush();
Response.End();
}
catch (Exception ex)
{
throw ex;
}
}
}
Inserting a page wasn't the way to go. Instead, merging the documents was essentially what we wanted. Therefore, we have come up with this method:
UPDATE
Below is the method that we came up with that successfully stitches a new PDF to the previous one.
private static void MergePDFs(PdfCopy writer, MemoryStream ms)
{
PdfReader populated_reader = new PdfReader(ms.ToArray());
//Add this pdf to the combined writer
int n = populated_reader.NumberOfPages;
for (int i = 1; i <= n; i++)
{
PdfImportedPage page = writer.GetImportedPage(populated_reader, i);
writer.AddPage(page);
}
}
What we need to do is create all of this in memory, then spit it out to the user for download.
Check out kuujinbo's tutorial here for combining/stitching PDFs together.
Before you do that, you'll also need to obviously generate the PDFs, too. You might be tempted to try to do it all in one pass which will work but will be harder to debug. Instead, I'd recommend making two passes, the first to create the individual documents and the second to combine them. Your first pass can temporarily writer them to either disk or memory. Your code (and kuujinbo's) actually writes directly to the Response stream which is completely valid, too, but also much harder to debug, especially if you wrap everything in a giant try/catch.
The number of PDFs you're joining and the frequency of generation should determine where you're temporarily storing the first pass two. If you're only doing up to a dozen or two and they're not giant I would persist to a MemoryStream and work with the .ToArray() byte data on that.
If you've got more PDFs than that or they're fairly large or this routine gets called often or you have RAM constraints you might be better persisting them to a unique folder first, stitching them and then deleting that folder.