I am trying to write a VBA-macro that converts a given MS word document into a sequential list of the document objects contained in that document (e.g., Paragraph, Table, etc.). For each of those objects I want to extract the text contained and its explicit formatting information to save it in a DB.
Would have any pointers for me how to get started? Are there any elegant solutions to this document parsing task?
Without knowing your full requirements this is just some suggestions.
You may be able to do what you want, but it will be a mammoth task to pull apart word documents and be able to stich them back together. If you di dwant to go with this approach, the best might be to pull out paragraphs, images etc and save these sections as individual documents in your database. They can then be put back together using
For i = 1 To ActiveDocument.Paragraphs.Count
MsgBox ActiveDocument.Paragraphs.Item(i)
Next i
ActiveDocument.Content.InsertAfter AnotherDocument
This is incredibly basic and will be a LOT of work to get working correctly.
I wonder would turning the documents into html be better (done simply by saving as HTML) and then you can use open source libraries to allow users to edit parts of the document. Eg add the jeditable plugin for jquery and almost any paragraph in your html word document becomes editable. A simple backend php script to save the changes and you have something that works. You can then also note what has changed for translation purposes.
They docs can be saved back as word docs or pdfs before being sent to the customer
Just an idea.
Related
I have a word document with a building block dropdrown to choose from several templates. I have the Word document set with a tool bar with the Send To Recipient email option. I'm trying to figure a way to get the subject line to change depending on what building block is chosen, if it's even possible.
Currently, my office is using different word files for each subject line, and I'm trying to get all templates in one file to cut down on all the different files workers need to have open at once.
Is this possible?
I have limited vb experience, but I have tried searching for macros and code that was similar to what I needed, and changing things to fit, but I'm at a loss. Anything I try outside of that just results in the subject like being static, remaining as the last edit.
I need help creating a solution that would help me resolve PDF text replacement. We hired a programmer that tried to achieve our objective with a python-coded app but failed.
Our project hopes to automate these steps:
Get a folder with pdf documents
In Each document, we need to find particular text (usually in the upper third of the page). The information can be on multiple pages.
Hide/erase/ and replace the text with different information that we would pull from excel or SQL database. Make sure that the text is replaced on all occasions.
Next, rename the document based on the assigned doc name (pulled from excel or SQL).
Have a report of all documents that were processed and when the new version was and was not created.
Keep the original document saved for review and comparison.
I am happy to provide the original code from the developers if necessary, but it did not work...
Thank you for your help, community!
I have encountered a stumbling block that exceeds my experience in working with MS Word macros. Until recently, I have not had much need to use macros in Word or Excel, and what little bit I have needed was completely internal to the document or spreadsheet.
Now I find myself needing guidance on creating a macro that will draw on external files to create a temporary new file.
Within a working folder, I have a Master Document and numerous daughter documents. The daughter documents all have file tags/keywords.
What I need is protected document (it can be DOC, PDF or HTML) that when opened will run a macro that will (1) query the user for a search term, (2) search all the files in the folder for file tags or keywords that match (3) open all those matching files into a single html page for display.
All the source files are Word 2010 but can be saved as PDF or HTML if it makes things easier.
Thanks in advance.
Word's Master Document 'feature' is a disaster waiting to happen! Use it only if you don't value your work. See:
https://wordmvp.com/FAQs/General/WhyMasterDocsCorrupt.htm
http://www.addbalance.com/word/masterdocuments.htm
As for opening all your source documents in a single window, that's not possible unless you actually merge those documents into a single file. The code for that is complex. For some code to get you started, see:
https://www.mrexcel.com/forum/general-excel-discussion-other-questions/1037570-vba-combine-differently-formatted-word-files-into-1-while-preserving-layouts-post4980503.html#post4980503
Before staring down a long road leading to a dead end--and especially so since I have to dust off Perl programming skills, then learn VBA--is the following scenario feasible?
Using Word 2010 VBA:
Open a starting .docm file (potentially a master document)
Display a form
Require user to enter data: project name, date, etc.
Scan the starting file's directory
Collect document properties: title, subject, total pages
Create a dynamic list from all document properties.
Insert list into form.
Allow user to select required documents (e.g. checkboxes)
Add selected documents to end of starting file.
Update inserted documents with user data: project name, date, etc.
(above)
Generate table of contents at beginning of starting file.
Prompt user to save file.
This is all feasible from Word VBA. From the description I wouldn't use Access, unless you need to store a large amount of data (or structured data) permanently - your description doesn't indicate this. Even then, if the data is just a simple (1D) table, I would prefer Excel to store it.
I wouldn't touch the Master Documents feature (if that is what you are referring to):
A master document has only two possible states: Corrupt, or just about
to be corrupt. And that is why we say that the only possible fix to a
master document is “don't use it!”
Why Master Documents Corrupt (MVP)
That page links to a further page here that describes how Master Documents might be used safely.
Besides which, your outline suggests that you are already creating your own version of a Master Document.
Hint: Rather than attempting to insert the document content as a file I would consider inserting a Section Break and then exploring the variety of Paste (and PasteSpecial) methods. Hans has some very useful code here.
My customer actually stores his documents, which are single page automotive forfeits, in a single MS Word document... this method is of course generating a huge file which is slow to open, not to talk about searches.
After a user compiles a document, he may need to print it to manually sign it. Then the document is scanned back and stored in PDF format. The document may be printed again to be
signed a second time by a manager. The doubly signed document is scanned again and saved
overwriting the singly-signed one.
The user wants to be able to search the document using a couple of search keys (the doc number and a sort of a SSN). That is the reason they are using a single file, to be able to search in the file using Word's search feature.
I have to propose an IT solution. I was thinking about giving them a software tool that:
reads a pdf form/template; the template rarely changes
shows the template on the screen and allows the user to input his variable fields in the form
some of the fields must be defined as searchable
the user saves only the form fields, not the whole pdf.
the sw is able to rebuild a document by coupling the template with the fields. I have to find a way to tie the template with the saved fields, so that the template can change (versioning) without breaking the old documents
the tool allows to search in multiple documents, using the defined search fields
the tool allows to print the document to manually sign it; this is the hard part. When the document is signed cannot be changed anymore, but if the document is simply scanned and coupled with the form/fields pdf, then I'll loose the benefits of only storing the data decoupled from the template. Should I only scan the signature and attach it to the document as an image?
What do you suggest to use?
Adobe XML Forms?
Adobe Forms Data Format?
An already existing software?
Other?
For the existing documents, I want allow the customer to import his huge MS Word file into the new system.
Thanks.
Sounds like you want a PDF form template that submits data to a dB that can be searched.
OTOH, if you just save the PDFs, Acrobat Pro can generate an index file from a directory, that can be searched (from reader?). Yep, you can run searches on an index from reader, but can only build them with Acrobat.
I prefer AcroForms to LiveCycle forms myself. There's a lot more software out there that works with 'em. If you go with LiveCycle, you're almost completely locked into Adobe. And Adobe server software is EXPENSIVE.