Why does the PDF /ID field not match the last created and last modified dates? - pdf

In the PDF structure, the trailer typically contains information regarding the date when a document was created and the date of when it was last modified. If these two match, we will know that this document has been untouched. However, I have also encountered examples of PDFs where the last modified and created dates match, but the /ID field (containing two hashes of the documents) suggests otherwise.
Since the ID field is [<hash of the document when created>, <hash of the document when modified>], shouldn't the two IDs also match when the dates are the same?

Related

Tables Count in MS Word not as expected

I have a VBA macro to process a number of MS Word files in a folder and create word lists and indexes. The data to be processed is within tables in each file (all text outside the tables must be ignored) and the text happens to be English and Latin.
Final editing has combined the files into a single file (fine...) but for printing layout considerations, a few forced "new page" entries have been added into some tables where the text in a cell within a row is significant.
The code iTables = wordApp.ActiveDocument.Tables.Count now returns the wrong value, increasing the table count by the number of manually entered "new page" entries that are within tables. Any "new page" entries outside of the tables have no impact.
I've searched for a similar problem description and also not found any mention of this within documentation.
Have I a basic misunderstanding? Has anyone a similar experience and a method to overcome it? (I do not want to split the tables as they are matched to transcriptions of original manuscripts.)

Retrieving reports from iManage based on the Document Name or generating list of documents with document information

I extract reports from iManage on daily basis and I was searching macro codes that would automate this process. After much search in various forums, I found this Ed Mozley's link, which I found very helpful to understand about the retrieval process from iManage.
Saving to iManage with VBA
To retrieve reports from the iManage, Ed mentions using of GetDocument function which has 2 parameters (document number, version number). In my case, however, the document number changes everyday with the updates after day-end process and are always unique.
I would like to know if there is a way to generate the list of documents created on a particular date and that list contains Document Name, Document Number, Version ID, Document Creation Time, Database etc. information. If I could generate the list of documents, I could compare my list of relevant documents with this list and then pull the document numbers and save them in array and create the relevant copies using codes suggested by Ed Mozley.
Or can we create copy of document based on the document name that partially matches with the name of the document available in iManage?
Any advice will of great help.
Thank you
Roshan

Auto fill in multiple date fields in multi page pdf, Acrobat DC PRO

i'm trying to make a PDF that contains multiple pages with forms that were paper forms. i have imported them into pdf, added text fields and am trying to find ways to fill out these forms quicker.
There is a text field for the DATE on multiple pages. on 1 page i need the dat in this format 200129 =YYMMDD format, and in the other pages i need it as 29-JAN-2020. is there a way i can select date or enter the date in 1 place, maybe in a special text box and link the other ones to get their data from the master text box with the date?
In general, just using the same field name across all pages will cause one to update the value of the others but because you want a different format for the same date value on all pages after number 1, you need to use a calculation. If you make the date on page one the "master", the date field on all subsequent pages can be calculated from that to be the same value. The date fields on all subsequent pages can use the same field name.
If your date field on page one is named "date_1", then add the following code to the other date fields in the field calculation tab...
event.value = this.getField("date_1").value;
You'll then need to use the Acrobat UI to set the format to what you want.
That's it.

VB.Net: Read Table from rtf-File

I have some RTF-Files with a table. Is there a way to get the content of the table into a datatable? Or is there a way to convert the table to csv?
I'll post this as a part answer only, as it is not complete, but can be used to solve the issue that you have.
From the document specified in my comment I found this detail...
Table Definitions
There is no RTF table group; instead, tables are specified as paragraph properties. A table is represented as a sequence of table rows. A table row is a contiguous series of paragraphs partitioned into cells. The table row begins with the \trowd control word and ends with the \row control word. Every paragraph that is contained in a table row must have the \intbl control word specified or inherited from the previous paragraph. A cell may have more than one paragraph in it; the cell is terminated by a cell mark (the \cell control word), and the row is terminated by a row mark (the \row control word). Table rows can also be positioned. In this case, every paragraph in a table row must have the same positioning controls (see the controls on the Positioned Objects and Frames subsection of this Specification. Table properties may be inherited from the previous row; therefore, a series of table rows may be introduced by a single .
You can find this detail from page 93 onward and does seem to provide the bulk of what you need to know.
From this point you should read the file into a string and then search it for each subsequent occurrence of \trowd (allowing for the closing \row command). This should allow the traversal of all tables within the RTF document. Using this method, and by analysing data within the table, you should be able to ascertain what is important to your requirements.

Extract MS Word document chapters to SQL database records?

I have a 300+ page word document containing hundreds of "chapters" (as defined by heading formats) and currently indexed by word. Each chapter contains a medium amount of text (typically less than a page) and perhaps an associated graphic or two. I would like to split the document up into database records for use in an iPhone program - each chapter would be a record consisting of a title, id #, and content fields. I haven't decided yet if I would want the pictures to be a separate field (probably just containing a file name), or HTML or similar style links in the content text. In any case, the end result would be that I could display a searchable table of titles that the user could click on to pull up any given entry.
The difficulty I am having at the moment is getting from the word document to the database. How can I most easily split the document up into records by chapter, while keeping the image associations? I thought of inserting some unique character between each chapter, saving to text format, and then writing a script to parse the document into a database based on that character, but I'm not sure that I can handle the graphics in this scenario. Other options?
To answer my own question:
Given a fairly simply formatted word document
convert it to an Open Office XML document
write a python script to parse the document into a database using the xml.sax python module.
Images are inserted into the record as HTML, to be displayed using a web interface.