Storing arbitrary meta-data in Microsoft Word document - vba

I need to store custom meta-data in a Word document. The 'document properties' are limited to 255 bytes but I have data which is >> 10k
We are using VBA to write a word extension to interact with our application and want to have our application data stored in the word document. The idea is that that the user can share just the word document without sharing any other data files of our application.
Has anyone ideas how to store arbitrary meta-data efficently in Office 2003+ documents?

Just tried this in the VBA editor immediate window:
ActiveDocument.Variables.Add "foo", String(10000, ".")
? Len(ActiveDocument.Variables("foo"))
10000
Document variables are stored with the document. They seem to go up to 65280 Unicode characters, as I just found out (tested in Word 2003; Update: This limit is the same in Word 2007 and 2010).
If you fear to get near the 60k for your data, I'd recommend either splitting it over several variables, or compressing it before you store it. This post shows some options to do that in VBA.

It fits not exactly your requirements because it does not work with Word 2003. However, it still might be interesting to you (otherwise I recommend you to use document variables as suggested by Tomalak):
In Word 2007 (and later) you may embed arbitrary XML documents in a Word document as Custom XML.
An example is described ins this article on MSDN: How to: Add Custom XML Parts to Documents by Using VSTO Add-Ins

Related

How can I change multiple word files' attribute as 'Read Only Recommended'

I have a daily word which I need to change around 100 word files as 'read-only-recommended' before sending to client.
I cannot figure out a VBA solution to do that, could you give me a hint? Thanks and regards.
There is a Microsoft Community Answers Forum article entitled Batch Editing MS Word Documents that you should read. It provides VBA code for both Windows and Mac versions of Office. In the article's supplied code, there are two locations marked with:
This is where you will insert your code for applying the edits you
want to perform
You would insert:
ActiveDocument.ReadOnlyRecommended = True

Import .docx contents into MS Access

I began writing a docx document to do a project of mine.
Recently, I realized that it would be easier to manage that data if it was in a database.
So, I wanted to import that data into MS Access automatically, to avoid copying and pasting the data manually.
Is there anyway to do it? I have only encontered ways of opening Word application via Access. I also know that docx has a XML structure, so I imagine if I can open that structure, it would be easy to do a parser in VBA
There are two basic ways information can be taken out of a Word document and put into an Access database: automating the Word object model using VBA code running in either Word or Access OR extracting the WordOpenXML that makes up the Word document. You indicate you lean towards the second option.
Here, again, there are a number of approaches available:
Use VBA in Word or Access to extract the WordOpenXML of the document open in the Word application user interface.
Use VBA in Access together with non-VBA tools to "crack open" the Zip file and extract the XML.
Use the tools available in the .NET Framework to extract the content of the ZIP file and write it to Access using an OLE DB connection.
I understand your goal is to be able to recreate the document at a later point for printing, so you want to preserve all the formatting. In addition, you want to be able to read the content from within Access.
I believe this will require a minimum of four fields in the Access table:
ID
Title
Text of song
The complete WordOpenXML for re-creating the document
You don't mention (4) in the discussion and problem description, but if you want to store the formatting AND you want to be able to read the content I believe this is necessary. While WordOpenXML is "readable", there's a lot of mark-up in there which doesn't make reading comfortable.
All things being equal, I'd go for either VBA working on the open Word document or the .NET approach, using the Open XML SDK (free download .NET library you can reference in Visual Studio and distribute with solutions).
One important thing to keep in mind is storing the Word Open XML in the database. Unless something has changed in Access, you can't store the ZIP file - you need a "streamable" format. That would be the OOXML OPC flat-file format.
When you read the WordOpenXML from a document using VBA, that's what you get, which is why that would be an option for me. The Open XML SDK doesn't have that option, but there is code available from Eric White's blog for doing this.
When you later want to recreate and print the document it should be enough to stream the WordOpenXML to a file with the .xml extension. Or you could convert it back to a docx zip file (same blog).

Easiest way to add raw Open XML code to a Word document

I want to insert a chunk of formatted text into a Word document in VBA. Since this will be done server-side, I am not advised take these chunks from another Word document using Office Interop (link), so I presume it would be easiest to use Open XML like this <w:p><w:r><w:t>String from WriteToWordDoc method.</w:t></w:r></w:p> Sadly Application.ActiveDocument.Range.InsertXML fails. So what other quick and dirty alternatives do I have?
For the "bare necessities" see this MSDN article:
https://msdn.microsoft.com/en-us/library/office/dn423225.aspx?f=255&MSPPError=-2147217396
It describes what the minimum requirements are for inserting Open XML into an open Word document. Don't worry that the discussion targets Web add-ins - the principle is the same for any API that inserts WordOpenXML into a Word document at run-time.
I may be misunderstanding where "server-side" is involved in your process, so forgive me if the following is not relevant to your situation: Note that using VBA server-side is a bad idea - Word is not designed to run in a server environment. Better would be to use the Open XML SDK both to retrieve and write the information between two Word documents.

Outlook Search Folder Criteria

Is it possible to set criteria of a Search Folder in Outlook to a large number of keywords, like this:
subject contains "abc" OR "def" OR "ghi" ...
Search for the word(s) box in Search Folder Criteria does support comma-separated list of values, but the max. length of that box is quite limited (255 chars I guess). The number of keywords I have is very large (hundreds of them). Adding them manually through Advanced tab is also a pain, so I'm looking for a more programmer's way. One of the following should work:
If Outlook stores these criteria in a flat file somewhere (like Thunderbird does), I'll edit it directly and inject my keywords into it.
If I could manipulate the criteria through Object Model (VBA), that's also a good solution.
If VSTO can do it, I have that experience as well.
Does someone know if any of the above method works?
Note: I'm using Outlook 2013 if that has something to do.
You can develop either a VBA macro or an Outlook add-in (VSTO can be used for developing an add-in). Try using the AdvancedSearch method of the Application class. See Advanced search in Outlook programmatically: C#, VB.NET for more information.

Search MS Word binary file for specific content

I have some .doc binary files stored in my database and i would like to now search them all (without converting them to .doc) to see which one contains the word "hello" for instance.
Is there any way to do this search in the binary file?
You could go down the route of using commercial tools. Aspose.Words can load a document from a stream and has all sorts of methods for finding text within the document.
If you have the stream from the DB, then you code would look like this:
Aspose.Words.Document doc = new Aspose.Words.Document(streamObjectFromDatabase);
if (doc.GetText().ToLower().Contains("hello world"))
MessageBox.Show("Hello World exists");
Note: The benefit of this tool is that it does not require Word objects to be installed and it can work with streams in memory.
Not without a lot of pain, as far as I can tell. According to Wikipedia, Microsoft has within the past few years finally released the .doc specification. So you could create a parser based on the spec if you have the time, assuming all of your documents are in the same version of the .doc format.
Of course you could just search for the text you're looking for amid all the binary data, on the assumption that the actual text is stored as plain text. But even if that assumption were true, how could you be sure that the plain text you found was the actual document text, and not some of the document meta data that's also stored in plain text? And there's always the off chance that the binary data will match your text pattern.
If the Word libraries are available to you, I would go that route. If not, a homegrown parser may be your least bad option.