Import .docx contents into MS Access - vba

I began writing a docx document to do a project of mine.
Recently, I realized that it would be easier to manage that data if it was in a database.
So, I wanted to import that data into MS Access automatically, to avoid copying and pasting the data manually.
Is there anyway to do it? I have only encontered ways of opening Word application via Access. I also know that docx has a XML structure, so I imagine if I can open that structure, it would be easy to do a parser in VBA

There are two basic ways information can be taken out of a Word document and put into an Access database: automating the Word object model using VBA code running in either Word or Access OR extracting the WordOpenXML that makes up the Word document. You indicate you lean towards the second option.
Here, again, there are a number of approaches available:
Use VBA in Word or Access to extract the WordOpenXML of the document open in the Word application user interface.
Use VBA in Access together with non-VBA tools to "crack open" the Zip file and extract the XML.
Use the tools available in the .NET Framework to extract the content of the ZIP file and write it to Access using an OLE DB connection.
I understand your goal is to be able to recreate the document at a later point for printing, so you want to preserve all the formatting. In addition, you want to be able to read the content from within Access.
I believe this will require a minimum of four fields in the Access table:
ID
Title
Text of song
The complete WordOpenXML for re-creating the document
You don't mention (4) in the discussion and problem description, but if you want to store the formatting AND you want to be able to read the content I believe this is necessary. While WordOpenXML is "readable", there's a lot of mark-up in there which doesn't make reading comfortable.
All things being equal, I'd go for either VBA working on the open Word document or the .NET approach, using the Open XML SDK (free download .NET library you can reference in Visual Studio and distribute with solutions).
One important thing to keep in mind is storing the Word Open XML in the database. Unless something has changed in Access, you can't store the ZIP file - you need a "streamable" format. That would be the OOXML OPC flat-file format.
When you read the WordOpenXML from a document using VBA, that's what you get, which is why that would be an option for me. The Open XML SDK doesn't have that option, but there is code available from Eric White's blog for doing this.
When you later want to recreate and print the document it should be enough to stream the WordOpenXML to a file with the .xml extension. Or you could convert it back to a docx zip file (same blog).

Related

Pass a bitmap object to Interop Word function that is expecting a filename string for a bitmap file

The title sounds insane but bear with me. This is a problem that could exist with any object.
I am generating a bitmap object in memory and I would like to pass it directly to another function that wants to open a bitmap file. The simple solution is to write the file to disk, call the function against the file, and then delete the file. I don't want to do that. If I am pushing a high volume of image objects in to a Word document with a VSTO add-in it doesn't make sense to thrash my disk for no reason when the whole thing could be done in memory.
I guess I am looking for a different function to insert a picture in to a Word document that accepts a bitmap object. Or a way to pass a filesystem object that actually points to memory (Not a RAMDisk, but a RAMFile?). Or a way to wire the "Image.Save" directly to the reader of the "AddPicture" function without actually making a file on disk.
Hopefully, there is a better way of doing this.
Here is the code example:
Dim newImage = GenerateImage(InputString, SelectedFormat)
Dim imagePath = Path.Combine(Path.GetTempPath(), Path.GetRandomFileName())
newImage.Save(imagePath, ImageFormat.Png)
With Globals.ThisAddIn.Application
.Selection.InlineShapes.AddPicture(imagePath)
End With
File.Delete(imagePath)
Word can't "stream" (see "Background", below) content, so your choices are 1) The Clipboard or 2) wrapping the bitmap in valid, Word Open XML OPC flat-file format, which means first converting the bitmap to base64.
For the first, you can use standard .NET methods to place the information on the Clipboard in the format you want Word to use. In the Word "interop", the Paste or PasteSpecial methods will insert it. The argument against this approach is, as ever, "interfering" with the user's Clipboard.
Using Word Open XML is as close as you can get to "streaming" content into Word, using the Range.InsertXML method.
Word documents (and other Office files) are essentially "zip packages" of XML and binary files that together make up the document. It's possible to create and edit these files without opening them in the Word (Office) application, which makes the format suitable for server-side work. Any tool that can work with zip files and xml can be used for this; standard is the Microsoft Open XML SDK which offers a complete API of the Office content.
Word, alone, of all the Office applications enables the developer to read and write content in the opened Word document using the OPC flat-file standard. This "concatenates" the entire content of the zip package into an XML String. The Word object model's Range.InsertXML method is used to write content in this format to a Word document open in the Word application.
Information on how to convert a zip package into OPC flat file can be found in this blog article. Information concerning minimal Word Open XML to have a valid OPC version is described in this article; there is a section in there specifically about working with graphics.
Background
Word is based on very old technology - late 1980's. By the mid-1990's it reached a very high standard as a professional word processor and what has happened with it since has mostly been "sugar coating" - adding a bit of this and a bit of that to bring it closer to HTML / page layouting. But the core of the application remains the same... and part of that means Word isn't able to do many of the things the modern developer expects - such as "streaming" data in and out.

Easiest way to add raw Open XML code to a Word document

I want to insert a chunk of formatted text into a Word document in VBA. Since this will be done server-side, I am not advised take these chunks from another Word document using Office Interop (link), so I presume it would be easiest to use Open XML like this <w:p><w:r><w:t>String from WriteToWordDoc method.</w:t></w:r></w:p> Sadly Application.ActiveDocument.Range.InsertXML fails. So what other quick and dirty alternatives do I have?
For the "bare necessities" see this MSDN article:
https://msdn.microsoft.com/en-us/library/office/dn423225.aspx?f=255&MSPPError=-2147217396
It describes what the minimum requirements are for inserting Open XML into an open Word document. Don't worry that the discussion targets Web add-ins - the principle is the same for any API that inserts WordOpenXML into a Word document at run-time.
I may be misunderstanding where "server-side" is involved in your process, so forgive me if the following is not relevant to your situation: Note that using VBA server-side is a bad idea - Word is not designed to run in a server environment. Better would be to use the Open XML SDK both to retrieve and write the information between two Word documents.

Create PDF Document in VBA

Using VBA outside of the MS-Office suite of applications, is there a good way to create a PDF document, or another light-weight document that can be converted to a PDF?
I have data within classes, which I want to get into a PDF, how do I do that quickly, without resorting to opening up MS Word, MS Excel, etc?
This link seems like it might be helpful with solving your issue.
http://forums.adobe.com/thread/840511
According to the information there you can print to the adobe pdf virtual printer and it will generate the file. The people on that forum were doing something with emailing the file afterwards and generating the file from access data, but it looks like a solution that will work for any VBA project.

Convert PDF OLE objects back to file (attachments) in Lotus Notes?

I have a database with tons of PDF documents embedded as OLE objects in Notes RichText fields. Those are not compatible with XPages, so I need to convert the OLE objects into file(attachment)s.
How can I do that in an automatic fashion (I know that it must run in a Notes client (must it?) - or is there a POI way to extract them?
Clarification
I can extract the blob (into memory if I want), but writing it out to disk doesn't create a PDF File since that blob is an OLE container. So I see 2 possible path:
Activate the OLE object and use a method in there
Read the blob and have something that extracts the PDF part (possibly Apache POI)
But I haven't touched any of these approaches and was wondering if some advice could save me hours of tests
Would it be possible with dxl tools? I've worked with the dxl exporter to extract embedded images from a document maybe this is also doable with ole objects?
I used a slightly changed version of the EmbeddedImage object of the lotusscript gold collection project on openntf
This library contains an object Embeddedimagelist which searches the DXL for picture tags and tries to parse its contents. Maybe this would also be applicable with embedded ole objects.
I'd think that something like searching for %PDF and then saving everything since as a file should five you PDF. Theoretically there could be a bunch of things in OLE file, but in most cases you'll get you file simply prefixed with an OLE header (or whatever it's called).
I've used this approach in one occasion (not for PDF though) and it seemed working fine.
I guess it's what openntf approach that jjtbsomhorst is talking about is based upon :-)

I need a (preferably free) PDF/Word generator .Net component that can work from a document template

I'm looking for a .Net component that will allow me to generate Word and/or PDF documents.
This must work on the server without MS Office installation. Preferably free. Also, it needs to be able to generate the documents based on an existing template of some sort i.e. I don't want to generate the whole document from scratch but allow a number of different templates that all have similar content that comes from elsewhere (e.g. database, XML files etc).
My initial investigations have turned up iTextSharp (but not sure if it can work from templates).
Any help that can expedite my investigation time will be much appreciated.
Thanks
I use ActivePDF at work with .NET - give it some HTML and it will output a pdf doc. However it isn't free - but we did look at a few other ways and this was 1
http://pdfcrowd.com/html-to-pdf-api/
It doesn't do word documents but converts html (your template) to pdf