Pass a bitmap object to Interop Word function that is expecting a filename string for a bitmap file - vb.net

The title sounds insane but bear with me. This is a problem that could exist with any object.
I am generating a bitmap object in memory and I would like to pass it directly to another function that wants to open a bitmap file. The simple solution is to write the file to disk, call the function against the file, and then delete the file. I don't want to do that. If I am pushing a high volume of image objects in to a Word document with a VSTO add-in it doesn't make sense to thrash my disk for no reason when the whole thing could be done in memory.
I guess I am looking for a different function to insert a picture in to a Word document that accepts a bitmap object. Or a way to pass a filesystem object that actually points to memory (Not a RAMDisk, but a RAMFile?). Or a way to wire the "Image.Save" directly to the reader of the "AddPicture" function without actually making a file on disk.
Hopefully, there is a better way of doing this.
Here is the code example:
Dim newImage = GenerateImage(InputString, SelectedFormat)
Dim imagePath = Path.Combine(Path.GetTempPath(), Path.GetRandomFileName())
newImage.Save(imagePath, ImageFormat.Png)
With Globals.ThisAddIn.Application
.Selection.InlineShapes.AddPicture(imagePath)
End With
File.Delete(imagePath)

Word can't "stream" (see "Background", below) content, so your choices are 1) The Clipboard or 2) wrapping the bitmap in valid, Word Open XML OPC flat-file format, which means first converting the bitmap to base64.
For the first, you can use standard .NET methods to place the information on the Clipboard in the format you want Word to use. In the Word "interop", the Paste or PasteSpecial methods will insert it. The argument against this approach is, as ever, "interfering" with the user's Clipboard.
Using Word Open XML is as close as you can get to "streaming" content into Word, using the Range.InsertXML method.
Word documents (and other Office files) are essentially "zip packages" of XML and binary files that together make up the document. It's possible to create and edit these files without opening them in the Word (Office) application, which makes the format suitable for server-side work. Any tool that can work with zip files and xml can be used for this; standard is the Microsoft Open XML SDK which offers a complete API of the Office content.
Word, alone, of all the Office applications enables the developer to read and write content in the opened Word document using the OPC flat-file standard. This "concatenates" the entire content of the zip package into an XML String. The Word object model's Range.InsertXML method is used to write content in this format to a Word document open in the Word application.
Information on how to convert a zip package into OPC flat file can be found in this blog article. Information concerning minimal Word Open XML to have a valid OPC version is described in this article; there is a section in there specifically about working with graphics.
Background
Word is based on very old technology - late 1980's. By the mid-1990's it reached a very high standard as a professional word processor and what has happened with it since has mostly been "sugar coating" - adding a bit of this and a bit of that to bring it closer to HTML / page layouting. But the core of the application remains the same... and part of that means Word isn't able to do many of the things the modern developer expects - such as "streaming" data in and out.

Related

Space getting added between characters while writing to PDF using binary write

Here is issue screenshot
Here is the sample code.
Dim rawData As Byte() = "sample data"
Response.ContentType = "application/pdf"
Response.ContentEncoding = System.Text.Encoding.UTF8
Response.BinaryWrite(rawData)
Response.End()
Space getting added between characters while writing to PDF using binary write
The underlying issue here is that you actually are not writing a PDF at all!
Your code essentially returns pure text data and then claims that it is a PDF. Such a claim doesn't change the text data in any way, though, they remain text and don't become a PDF.
The PDF viewer you use apparently attempts to somehow display what it got nonetheless but the result thereof turns out to be very unsatisfying (a proportional font seems to be used in a monospaced manner).
If you actually want to return a PDF, you have to explicitly create one. PDF is a complicated binary format best to create using a dedicated library.
Look for pdf libraries for your environment. You can find some that have explicit ways to add table or paragraph structures to the pdf, and some that create content by conversion from another structured format e.g. html.
The output of these libraries is a binary in pdf format which you can return from your code using Response.BinaryWrite.
Recently one can read in a number of questions that people have data in text or html format, return it setting some binary content type (PDF in this question, MS office formats in other ones), and then assume they so have generated a file in that format.
This is wrong, claiming a format doesn't transform into that format!
All this setting of the content type does, is informing the client what kind of viewer to use to open the data.
Probably this anti-pattern came up because MS Word (and most likely other word processors, too) can also open plain text and html text files and display them fairly properly. Thus, this anti-pattern at first glance appears to work somehow.
If you promised your client, though, that your application returns MS Office documents, don't return HTML or plain text claiming it to be an Office document, instead do create actual MS Office documents! Otherwise knowledgeable clients will not accept your implementation and clients who did accept it will eventually be informed by knowledgeable users that you cheated them which will at least lessen your renown.

Import .docx contents into MS Access

I began writing a docx document to do a project of mine.
Recently, I realized that it would be easier to manage that data if it was in a database.
So, I wanted to import that data into MS Access automatically, to avoid copying and pasting the data manually.
Is there anyway to do it? I have only encontered ways of opening Word application via Access. I also know that docx has a XML structure, so I imagine if I can open that structure, it would be easy to do a parser in VBA
There are two basic ways information can be taken out of a Word document and put into an Access database: automating the Word object model using VBA code running in either Word or Access OR extracting the WordOpenXML that makes up the Word document. You indicate you lean towards the second option.
Here, again, there are a number of approaches available:
Use VBA in Word or Access to extract the WordOpenXML of the document open in the Word application user interface.
Use VBA in Access together with non-VBA tools to "crack open" the Zip file and extract the XML.
Use the tools available in the .NET Framework to extract the content of the ZIP file and write it to Access using an OLE DB connection.
I understand your goal is to be able to recreate the document at a later point for printing, so you want to preserve all the formatting. In addition, you want to be able to read the content from within Access.
I believe this will require a minimum of four fields in the Access table:
ID
Title
Text of song
The complete WordOpenXML for re-creating the document
You don't mention (4) in the discussion and problem description, but if you want to store the formatting AND you want to be able to read the content I believe this is necessary. While WordOpenXML is "readable", there's a lot of mark-up in there which doesn't make reading comfortable.
All things being equal, I'd go for either VBA working on the open Word document or the .NET approach, using the Open XML SDK (free download .NET library you can reference in Visual Studio and distribute with solutions).
One important thing to keep in mind is storing the Word Open XML in the database. Unless something has changed in Access, you can't store the ZIP file - you need a "streamable" format. That would be the OOXML OPC flat-file format.
When you read the WordOpenXML from a document using VBA, that's what you get, which is why that would be an option for me. The Open XML SDK doesn't have that option, but there is code available from Eric White's blog for doing this.
When you later want to recreate and print the document it should be enough to stream the WordOpenXML to a file with the .xml extension. Or you could convert it back to a docx zip file (same blog).

Fastest way to save OpenXML Visual Basic

I'm working on a system that, using Interop, reads in a word doc (that acts as a template) and excel document. The program then uses OpenXML to replace keywords in the word doc based on data in the excel document. It then saves the converted document as a pdf. However the process is slow as it uses a lot of read write requests:
It saves a copy of the template to a directory where the converted files are to go
It then processes it with the excel data read into a datatable and uses openxml to convert
It then converts the word file to pdf and deletes the existing word document
It has to do this for each row in the excel document (which is a lot of reads and writes to storage). I was wondering if there is a way to keep the processing in memory? Unfortanutely the openxml library (a modified I use) needs a directory to the actual word doc and can't just use a word doc object.
I'm thinking of doing the deleting on a directory level instead of file level in an attempt to speed it up. I'm not sure how much of a difference this will make.
Are there any blatant optimisations I'm missing?

Convert PDF OLE objects back to file (attachments) in Lotus Notes?

I have a database with tons of PDF documents embedded as OLE objects in Notes RichText fields. Those are not compatible with XPages, so I need to convert the OLE objects into file(attachment)s.
How can I do that in an automatic fashion (I know that it must run in a Notes client (must it?) - or is there a POI way to extract them?
Clarification
I can extract the blob (into memory if I want), but writing it out to disk doesn't create a PDF File since that blob is an OLE container. So I see 2 possible path:
Activate the OLE object and use a method in there
Read the blob and have something that extracts the PDF part (possibly Apache POI)
But I haven't touched any of these approaches and was wondering if some advice could save me hours of tests
Would it be possible with dxl tools? I've worked with the dxl exporter to extract embedded images from a document maybe this is also doable with ole objects?
I used a slightly changed version of the EmbeddedImage object of the lotusscript gold collection project on openntf
This library contains an object Embeddedimagelist which searches the DXL for picture tags and tries to parse its contents. Maybe this would also be applicable with embedded ole objects.
I'd think that something like searching for %PDF and then saving everything since as a file should five you PDF. Theoretically there could be a bunch of things in OLE file, but in most cases you'll get you file simply prefixed with an OLE header (or whatever it's called).
I've used this approach in one occasion (not for PDF though) and it seemed working fine.
I guess it's what openntf approach that jjtbsomhorst is talking about is based upon :-)

Search MS Word binary file for specific content

I have some .doc binary files stored in my database and i would like to now search them all (without converting them to .doc) to see which one contains the word "hello" for instance.
Is there any way to do this search in the binary file?
You could go down the route of using commercial tools. Aspose.Words can load a document from a stream and has all sorts of methods for finding text within the document.
If you have the stream from the DB, then you code would look like this:
Aspose.Words.Document doc = new Aspose.Words.Document(streamObjectFromDatabase);
if (doc.GetText().ToLower().Contains("hello world"))
MessageBox.Show("Hello World exists");
Note: The benefit of this tool is that it does not require Word objects to be installed and it can work with streams in memory.
Not without a lot of pain, as far as I can tell. According to Wikipedia, Microsoft has within the past few years finally released the .doc specification. So you could create a parser based on the spec if you have the time, assuming all of your documents are in the same version of the .doc format.
Of course you could just search for the text you're looking for amid all the binary data, on the assumption that the actual text is stored as plain text. But even if that assumption were true, how could you be sure that the plain text you found was the actual document text, and not some of the document meta data that's also stored in plain text? And there's always the off chance that the binary data will match your text pattern.
If the Word libraries are available to you, I would go that route. If not, a homegrown parser may be your least bad option.