MS Word macro to search for and combined specific files into one document - vba

I have encountered a stumbling block that exceeds my experience in working with MS Word macros. Until recently, I have not had much need to use macros in Word or Excel, and what little bit I have needed was completely internal to the document or spreadsheet.
Now I find myself needing guidance on creating a macro that will draw on external files to create a temporary new file.
Within a working folder, I have a Master Document and numerous daughter documents. The daughter documents all have file tags/keywords.
What I need is protected document (it can be DOC, PDF or HTML) that when opened will run a macro that will (1) query the user for a search term, (2) search all the files in the folder for file tags or keywords that match (3) open all those matching files into a single html page for display.
All the source files are Word 2010 but can be saved as PDF or HTML if it makes things easier.
Thanks in advance.

Word's Master Document 'feature' is a disaster waiting to happen! Use it only if you don't value your work. See:
https://wordmvp.com/FAQs/General/WhyMasterDocsCorrupt.htm
http://www.addbalance.com/word/masterdocuments.htm
As for opening all your source documents in a single window, that's not possible unless you actually merge those documents into a single file. The code for that is complex. For some code to get you started, see:
https://www.mrexcel.com/forum/general-excel-discussion-other-questions/1037570-vba-combine-differently-formatted-word-files-into-1-while-preserving-layouts-post4980503.html#post4980503

Related

How Do I build A PDF Find and Replace Text App and Automate PDF Processing?

I need help creating a solution that would help me resolve PDF text replacement. We hired a programmer that tried to achieve our objective with a python-coded app but failed.
Our project hopes to automate these steps:
Get a folder with pdf documents
In Each document, we need to find particular text (usually in the upper third of the page). The information can be on multiple pages.
Hide/erase/ and replace the text with different information that we would pull from excel or SQL database. Make sure that the text is replaced on all occasions.
Next, rename the document based on the assigned doc name (pulled from excel or SQL).
Have a report of all documents that were processed and when the new version was and was not created.
Keep the original document saved for review and comparison.
I am happy to provide the original code from the developers if necessary, but it did not work...
Thank you for your help, community!

Scan a directory, generate a dynamic form, and take user import, to build master document?

Before staring down a long road leading to a dead end--and especially so since I have to dust off Perl programming skills, then learn VBA--is the following scenario feasible?
Using Word 2010 VBA:
Open a starting .docm file (potentially a master document)
Display a form
Require user to enter data: project name, date, etc.
Scan the starting file's directory
Collect document properties: title, subject, total pages
Create a dynamic list from all document properties.
Insert list into form.
Allow user to select required documents (e.g. checkboxes)
Add selected documents to end of starting file.
Update inserted documents with user data: project name, date, etc.
(above)
Generate table of contents at beginning of starting file.
Prompt user to save file.
This is all feasible from Word VBA. From the description I wouldn't use Access, unless you need to store a large amount of data (or structured data) permanently - your description doesn't indicate this. Even then, if the data is just a simple (1D) table, I would prefer Excel to store it.
I wouldn't touch the Master Documents feature (if that is what you are referring to):
A master document has only two possible states: Corrupt, or just about
to be corrupt. And that is why we say that the only possible fix to a
master document is “don't use it!”
Why Master Documents Corrupt (MVP)
That page links to a further page here that describes how Master Documents might be used safely.
Besides which, your outline suggests that you are already creating your own version of a Master Document.
Hint: Rather than attempting to insert the document content as a file I would consider inserting a Section Break and then exploring the variety of Paste (and PasteSpecial) methods. Hans has some very useful code here.

VBA Word: How to separate content from formatting information?

I am trying to write a VBA-macro that converts a given MS word document into a sequential list of the document objects contained in that document (e.g., Paragraph, Table, etc.). For each of those objects I want to extract the text contained and its explicit formatting information to save it in a DB.
Would have any pointers for me how to get started? Are there any elegant solutions to this document parsing task?
Without knowing your full requirements this is just some suggestions.
You may be able to do what you want, but it will be a mammoth task to pull apart word documents and be able to stich them back together. If you di dwant to go with this approach, the best might be to pull out paragraphs, images etc and save these sections as individual documents in your database. They can then be put back together using
For i = 1 To ActiveDocument.Paragraphs.Count
MsgBox ActiveDocument.Paragraphs.Item(i)
Next i
ActiveDocument.Content.InsertAfter AnotherDocument
This is incredibly basic and will be a LOT of work to get working correctly.
I wonder would turning the documents into html be better (done simply by saving as HTML) and then you can use open source libraries to allow users to edit parts of the document. Eg add the jeditable plugin for jquery and almost any paragraph in your html word document becomes editable. A simple backend php script to save the changes and you have something that works. You can then also note what has changed for translation purposes.
They docs can be saved back as word docs or pdfs before being sent to the customer
Just an idea.

VB.net text -> Excel conversion (with extensive formatting required after conversion)

I'm creating a program in VB.net that does the following:
At a high level I receive a file in email, put the attachment in a monitored folder, import the text file to excel, format the excel, and then email the excel file to a list of recipients.
Here is my plan:
Completed: Outlook VBA to monitor all incoming email for specific message. Once message is received drop attached .txt file in a specific network folder.
Completed: (VB.net) Monitor folder, when text file is added begin processing
Not Complete: (VB.net) Import text file to Excel
Not Complete: (VB.net) Format Excel Text file. (add in a row of data, format column headers with color/size, add some blank columns, add data validation to some of the blank columns that allow drop down selections)
Completed: (VB.net) Save file.
Completed: (VB.net) Send file to list of recipients.
Obviously the items above that are not complete are the bulk of the work, but I wanted to get some advice on what some of you think would be the best way to approach something like this. The import and formatting of the file are causing me some problems because I just can't decide what would be the most efficient way to do this.
What I've thought of so far:
The way stated above. Import to excel -> format
Having a template excel that contains all of the formatting already done for me and attempting to transition the data to this document (no clue if/how I can do this). Is it even feasible? Have the template already created and then import the text file to a new excel file, then transition that data to the excel template?
Something I thought about, in terms of formatting the document, was to record a macro of me doing all of the formatting that I'm going to need and then attempt to convert that macro into my vb.net code, but I'm not sure if that will work. I will need to verify that the text file comes in the EXACT format every time correct?
I really appreciate any advice/suggestions that anyone is willing to give.
You will want to use http://epplus.codeplex.com/
It allows you to create an Excel file from scratch, without having to start Excel itself. Automating Excel will make the process slow and it lacks robustness (Excel process can hang or not close properly).
In addition, using a .Net library allows you to run it on a server or so where no Excel is installed. (Next step would be to inspect the mailbox via POP, IMAP or the Exchange API, so that part doesn't have to be run on a client machine either)
http://msdn.microsoft.com/en-us/library/kh3965hw(v=vs.100).aspx
You can also just use the Interops from MS to interact with Excel, Outlook, Word, etc. They're not difficult at all to use. I'm not familiar with CodePlex, so that may be a better route or an easier one. I just wanted to provide you with an alternative.
With Microsoft Office 2010 Interops you can not generate Office files from .net applications anymore.
You can manipulate data from existing Excel files so you need templates(your 4th point). Then Excel allows you to query some databases. You may be able to simulate one with your folder, otherwise I suggest to convert your .txt files into some databases. (3rd point)
If you do use an older version, you can crate your Excel files by loading them into an instance of Excel and manipulating them as you wish.
By the way I supposed your attached files would have some sort of format.
If you want to manipulate Excel files, I can recommand you the NPOI library found on CodePlex. It has several advantages over OLE-automation:
NPOI is not dependent on a specific Excel version.
Excel (or any other Office component) need not to be installed.
It is faster.
It works with both .XLS and .XLSX files.
We are using a third party software called excel writer. May not be what you are looking for becauseit needs to be license, but it is very fast and the clients does not have to wait for a chart or a data output. Because we have that tool, have not try anything else.

advice on technology to use for document/form creation and indexing

My customer actually stores his documents, which are single page automotive forfeits, in a single MS Word document... this method is of course generating a huge file which is slow to open, not to talk about searches.
After a user compiles a document, he may need to print it to manually sign it. Then the document is scanned back and stored in PDF format. The document may be printed again to be
signed a second time by a manager. The doubly signed document is scanned again and saved
overwriting the singly-signed one.
The user wants to be able to search the document using a couple of search keys (the doc number and a sort of a SSN). That is the reason they are using a single file, to be able to search in the file using Word's search feature.
I have to propose an IT solution. I was thinking about giving them a software tool that:
reads a pdf form/template; the template rarely changes
shows the template on the screen and allows the user to input his variable fields in the form
some of the fields must be defined as searchable
the user saves only the form fields, not the whole pdf.
the sw is able to rebuild a document by coupling the template with the fields. I have to find a way to tie the template with the saved fields, so that the template can change (versioning) without breaking the old documents
the tool allows to search in multiple documents, using the defined search fields
the tool allows to print the document to manually sign it; this is the hard part. When the document is signed cannot be changed anymore, but if the document is simply scanned and coupled with the form/fields pdf, then I'll loose the benefits of only storing the data decoupled from the template. Should I only scan the signature and attach it to the document as an image?
What do you suggest to use?
Adobe XML Forms?
Adobe Forms Data Format?
An already existing software?
Other?
For the existing documents, I want allow the customer to import his huge MS Word file into the new system.
Thanks.
Sounds like you want a PDF form template that submits data to a dB that can be searched.
OTOH, if you just save the PDFs, Acrobat Pro can generate an index file from a directory, that can be searched (from reader?). Yep, you can run searches on an index from reader, but can only build them with Acrobat.
I prefer AcroForms to LiveCycle forms myself. There's a lot more software out there that works with 'em. If you go with LiveCycle, you're almost completely locked into Adobe. And Adobe server software is EXPENSIVE.