After converting PDFs to Word using the company's authorized program, the tables in my Word file are not converted as Table Objects, but use a weird format where its lines are actually a shape/picture of a table anchored below the text, and the text in the original tables are regular paragraphs where blanks are replaced by (a lot) spaces.
This gives something like:
Heading 1 Heading 2 Heading 3
Please note that the converted "tables" can be rather complex with merged cells, which makes working directly on them to reconstitue the table a pretty difficult tast (I tried)
After having tried a number of solutions to transform these fake tables back into real ones (which all failed), my last idea is to find the absolute/relative positions of the beginning and the end of the shape (which is supposed to be the table's lines), to set the Selection.Range relative to these two positions, and to delete everything within this range in order to put better Tables back later on (using for example another conversion tool).
My question is: is it possible to set a range/selection using a kind of (X;Y) on the page? Something in the idea of
Selection.setRange (Start:= 20,30 relative to page, End:= 100, 50 relative to page).
I know Word doesn't really work like that, but I don't see too many solutions to the initial problem.
Related
I'm responsible for remediating a PDF that has been generated by a third-party, proprietary system for which I have no access to the layout or design. The goal is to pass the adobe acrobat DC accessibility checker before publication.
Some of the tables in the PDF span multiple pages horizontally (i.e. with a page break at column 4 of 7). Thus far, I have designated each piece of text content as a "Cell" and grouped those into a "Table Row" tag and defined each header and sub-header as a "Table Header Cell".
However, Acrobat DC seems to get confused as to the relative size and spacing of each table element. It is creating phantom column headers and rearranging or combining rows in order to fit the appearance of a more standard layout PER PAGE. But since I need one cohesive table to span TWO PAGES, this is breaking my accessibility.
Depending on how I nest my table elements, I get a table layout like one of the two examples below:
Example when including blank cells for multi-column header rows
Example when defining the column span of multi-colum header rows as "7"
As you can see, the layout is not uniform and does not pass regularity checks. Plus, as I add more rows with several blank cells, the table editor produces an error that reads:
"Unknown Table Structure Encountered"
The only way I have managed to remove this error, is to exclude the bolded main-section sub-headers from the tag structure entirely, but I cannot just leave them as untagged content and pass the checker.
Please help.
Signed up just to comment to
Kevin, thanks for replying. Because of the malformed grid, I cannot even click on the cells on Page 2 in order to associate headers. Is there a way to define table structure without using the Table Editor mode? – Glamador Apr 3 at 12:27
but don't have the rep yet to do so:
Glamador - Knowing it can't help you half a year ago but might in the future: I encountered this in a document this week and figured out the "Why" and how to get the Table Editor back, but not the "Easiest/best way to solve" the tagging in Acrobat. This issue is denying you Table Editor is with the table header (TH) cell you created that spans multiple pages.
So if you set a table header cell to something like Row Span: 7, and 3 of those are on the second page Acrobat will give you the "Unknown table structure encountered. Please retag this table using the Reading Order Tool to possibly fix the problem." error any time you try to use the Table Editor on the table that has that [table header cell with a multi-page row span/I'm not working with but assume column span too].
To get your Table Editor use back (not solving the tagging of accessibility, but to quit getting that error on your table,):
Go to your tags
Create a new empty Table Header Cell
Drag the content displayed in the tag from the problem TH to your new TH
Delete the [multiple page row/column spanning, but now empty] problem TH
Repeat if you did this in multiple TH in the same table
You can now use Table Editor again
Note: Because you can't use the Table Editor once these problem headers have been created you can't use it to see which TH's you have set to span multiple pages, or see those row/column spans, so you're going to have to just look at your document if you went through tagging and are going back and checking later and figure out which are the likely problem headers to replace. If you create that header span again in the table that goes across multiple pages you'll be unable to use the Table Editor again until you delete that tag with the page spanning issue.
I haven't found if you can combine TH Row Span settings with IDs/Associated Header Cell IDs and have the user software identify both, so I've been doing the tedious ID association on large but simple tables as my "It's tagged correctly" option, but unfortunately it isn't nearly as fast and easy as Row Spans.
You can edit the tag's object properties by right-clicking on the tag and then you can add an ID there if it doesn't already have one. Be sure each data cell is associated with a header cell. PAC's screen reader preview will also give a good view of the layout to help you get everything associated correctly.
I have a program which outputs a collection of tables in a word document which I eventually want to post as an html file with bookmarks and an index. The tables are grouped by "Name:" where there is a 3 row table that contains detailed header information for a section of data, then there is a second table which can span multiple pages which contains the data for that section. There is then a page break so that the next sections header table is on a new page. This can occur for a variable number of sections numbers in the hundreds. I need to write a script that
searches my document for "Name:", which is unique and would not
appear anywhere but the header table,
grabs the text that follows "Name:" within that table cell (for example "Name: Line 1234)
replaces all the blanks in that text string with an underscore to
make it a suitable bookmark name,
creates a bookmark with the name,
goes back and creates an index at the front of the document
Saves the file as an html
I have a passing familiarity with VB for word, I have used it a bit in excel, but am by no means an expert. I would appreciate any advice on functions and objects that I should be using for this script.
Hey MikeV from what I can gather, your problem seems more conceptual, less specific. What I mean is, have you started yet? Or looking at a blank script page?
I'm relatively new to coding, so I get that myself. What I do is make a list of what I need to do (what you have). Then think of the code or psuedo-code that would go with each step. Then you can start to build your script. You don't have to start with step one (as step 2/3 is often the more interesting bit), but let's do that.
Now, you need to search for a text string containing "Name:". I am proficient with VBA in excel, but haven't done anything for word. So I'd look it up. Googling "VBA find word in word document" will bring you to this page, which shows you how to approach step one. So steal their code, alter it to fit your needs and move on to step 2. Repeat the process, and that's how you build your algorithm! :)
Just a FYI, typically StackOverflow is for specific questions with an answer that can be confirmed, whereas you asked for help building an algorithm. I'd reserve those questions for your programming professor or friend who can help.
cheers
I'm automating a word document in vb.net. My problem is I need a table within another table to repeat the first row. Is there any way to do this?
The table's textwrap is set to none, and the first row is the only one that has the repeat has header property set.
I CAN'T take the table out of it's containing table. This solution is not an option.
This has nothing to do with the fact that the document is automated too.
Using word 2010.
I just did a quick test with Word 2010.
I created a table, and checked "repeat as header row at the top of each page".
Sure enough, that worked.
Then I created another table, and cut/pasted the first one into it.
The header was not repeated in the Word UI, although the property remained set.
I had a quick look around for properties which might affect the behaviour of nested tables, but couldn't see any.
Then I googled "word nested table repeat header row", which returns quite a few relevant results.
Conclusion: you can't make a header row in a nested table repeat on subsequent pages. Tricks like putting it inside a content control didn't seem to work either.
I have a pretty complex VBA plugin for Word written that automatically creates a report for me, using XML input, cycling through the X objects within the report to create the output. It is currently embedded into a Word Template file .DOCM.
I need to insert into the report a static list of text, based on the name of the item within the XML. For example, within my XML I have entries with a name BLAH1, BLAH2, BLAH3. Every time I see BLAH1, I need to match it with the static INSERT1, and BLAH2 match it with INSERT2, etc.
This seems simple enough, but her lies the problem...
It appears there are no Hashmap's in VBA without requiring external libraries, which I can't really rely on, since I can't install items on the machines where this will be running. As a result I can't store this reference data in a Hashmap as far as I can tell.
I can't seem to concatenate more than about 20 lines of strings together without hitting a max within VBA, and just parsing the chunk of text for what I need since there are about 1500 "lines" in my reference data, which greatly exceeds 20.
I also haven't found a way to embed a text, or any other type of file to hold this information within the file, and then parse the data.
I really would like to have everything within the single template file, without requiring additional text or other files to be bundled with the document. If there is no other option, I will go that route, but I wanted to see what create ideas people at Stackoverflow might have first ;-)
Have you considered using Word's Document Variables? They are name/value pairs stored invisibly within the document. (ActiveDocument.Variables("BLAH1").Value = "INSERT1" to create one, debug.print ActiveDocument.Variables("BLAH1").Value to retrieve a value (you have to use an error handler to detect non-existent indices if you go that route). Word can store (at least) hundreds of thousands of these things).
I have a Worksheet with 10 columns and data range from A1:J55. Col A has the invoice # and rest of the columns have other demographic data. Goal is to type the invoice number on a cell and display all the rows matching the invoice number from col A.
Besides auto filter function, the only thing comes to my mind is VBA. Please advice what is the best way to get the data. Thanks for your help in advance.
Alright, I'm pretty proud of this one. Again avoiding VBA, this one uses the volatile formula OFFSET to keep moving its VLOOKUP search down the table until it's found all matches. Just make sure you paste enough rows of the formula that if there are many matches, there's room for all of them to appear. If you put a border around your match area then it would be clear if you ever ran out of room and needed to copy down the formula some more.
Again, in the main section, it's just a single formula (using index):
=IFERROR(INDEX($A$1:$J$200,$M3,MATCH(N$2,$A$1:$J$1,0)),"")
This gets to be so simple because the hard work of the lookup is done by an initial column which looks up the next row that matches the invoice number. It has the formula:
=IFERROR(MATCH($L$2,OFFSET($A$1:$A$200,M2,0),0)+M2," ")
Here is the working example that goes with those formulas:
Let me know if you need any further description of how it works, but it mostly uses the same rules as above so that it's robust in copying and moving around.
I've uploaded the Excel file so you can play with it, but everything you need to reproduce this feature should be in this solution.
Google Docs - Click link and hit Ctrl+S to download and open in Excel.
A popular solution to this problem is a simple VLookup. Lookup the invoice the user types in on the table A1:J55, and then return an adjascent column's data.
Here's an example of it working:
The formula in the highlighted cell is:
=VLOOKUP($L3,$A:$J,MATCH(N$2,$1:$1,0),FALSE)
What's nice about this formula is you only need to type it once and then you can copy it across and it'll automatically pick out the correct column of the table (that's the match part). The rest is very simple:
The first part says lookup value $L3 (the invoice number typed in),
The second part says look it up in range $A:$J (which is where your table is located). I've shown how you can select the entire columns $A:$J so that you can add and remove data without worrying about adjustin the range in your lookups. (Excel takes care of optimizing the formula so that unused cells aren't checked)
The third part picks the column from which the resulting data will be drawn once a matching row is found.
The FALSE part is an indication that the invoice number must match exactly (no approximate matching allowed)
The $ signs ensure that fixed ranges like the location of your source table ($A:$J) and your lookup value ($L3) don't get automatically changed as you copy the formula across for multiple columns.
The formula is pretty easy to adapt if you want to move around your table and the area where you do your lookup. Here's an example:
Bonus
If you want to add a little spiff, you can add a dropdown to the Invoice # field so that the user gets auto-completion and the option to browse existing values like so: