VB.Net: Read Table from rtf-File

VB.Net: Read Table from rtf-File - vb.net

I have some RTF-Files with a table. Is there a way to get the content of the table into a datatable? Or is there a way to convert the table to csv?

I'll post this as a part answer only, as it is not complete, but can be used to solve the issue that you have.
From the document specified in my comment I found this detail...
Table Definitions
There is no RTF table group; instead, tables are specified as paragraph properties. A table is represented as a sequence of table rows. A table row is a contiguous series of paragraphs partitioned into cells. The table row begins with the \trowd control word and ends with the \row control word. Every paragraph that is contained in a table row must have the \intbl control word specified or inherited from the previous paragraph. A cell may have more than one paragraph in it; the cell is terminated by a cell mark (the \cell control word), and the row is terminated by a row mark (the \row control word). Table rows can also be positioned. In this case, every paragraph in a table row must have the same positioning controls (see the controls on the Positioned Objects and Frames subsection of this Specification. Table properties may be inherited from the previous row; therefore, a series of table rows may be introduced by a single .
You can find this detail from page 93 onward and does seem to provide the bulk of what you need to know.
From this point you should read the file into a string and then search it for each subsequent occurrence of \trowd (allowing for the closing \row command). This should allow the traversal of all tables within the RTF document. Using this method, and by analysing data within the table, you should be able to ascertain what is important to your requirements.

Related

How can I determine whether a table row immediately follows a page break

Given a Word table which spans several pages, how can my code determine that a table row is the first following an automatic page break? Note that table rows are of different heights thus a solution of the form "every 13th row is the first row on a new page" won't work.
The point of this would be to add extra text in the first cell at the top of every new page.

Use wdActiveEndPageNumber for that table row. Be sure that row is as big as it's going to get before checking the page number.
n = word.ActiveDocument.Tables(a).rows(b).Range.Information(wdActiveEndPageNumber)

If the table is sufficiently large that it splits over several pages then you compare the page number of consecutive rows until the page number changes. There are no pagebreaks to speak of.

How do I format/tag an accessible PDF table that spans multiple pages horizontally?

I'm responsible for remediating a PDF that has been generated by a third-party, proprietary system for which I have no access to the layout or design. The goal is to pass the adobe acrobat DC accessibility checker before publication.
Some of the tables in the PDF span multiple pages horizontally (i.e. with a page break at column 4 of 7). Thus far, I have designated each piece of text content as a "Cell" and grouped those into a "Table Row" tag and defined each header and sub-header as a "Table Header Cell".
However, Acrobat DC seems to get confused as to the relative size and spacing of each table element. It is creating phantom column headers and rearranging or combining rows in order to fit the appearance of a more standard layout PER PAGE. But since I need one cohesive table to span TWO PAGES, this is breaking my accessibility.
Depending on how I nest my table elements, I get a table layout like one of the two examples below:
Example when including blank cells for multi-column header rows
Example when defining the column span of multi-colum header rows as "7"
As you can see, the layout is not uniform and does not pass regularity checks. Plus, as I add more rows with several blank cells, the table editor produces an error that reads:
"Unknown Table Structure Encountered"
The only way I have managed to remove this error, is to exclude the bolded main-section sub-headers from the tag structure entirely, but I cannot just leave them as untagged content and pass the checker.
Please help.

Signed up just to comment to
Kevin, thanks for replying. Because of the malformed grid, I cannot even click on the cells on Page 2 in order to associate headers. Is there a way to define table structure without using the Table Editor mode? – Glamador Apr 3 at 12:27
but don't have the rep yet to do so:
Glamador - Knowing it can't help you half a year ago but might in the future: I encountered this in a document this week and figured out the "Why" and how to get the Table Editor back, but not the "Easiest/best way to solve" the tagging in Acrobat. This issue is denying you Table Editor is with the table header (TH) cell you created that spans multiple pages.
So if you set a table header cell to something like Row Span: 7, and 3 of those are on the second page Acrobat will give you the "Unknown table structure encountered. Please retag this table using the Reading Order Tool to possibly fix the problem." error any time you try to use the Table Editor on the table that has that [table header cell with a multi-page row span/I'm not working with but assume column span too].
To get your Table Editor use back (not solving the tagging of accessibility, but to quit getting that error on your table,):
Go to your tags
Create a new empty Table Header Cell
Drag the content displayed in the tag from the problem TH to your new TH
Delete the [multiple page row/column spanning, but now empty] problem TH
Repeat if you did this in multiple TH in the same table
You can now use Table Editor again
Note: Because you can't use the Table Editor once these problem headers have been created you can't use it to see which TH's you have set to span multiple pages, or see those row/column spans, so you're going to have to just look at your document if you went through tagging and are going back and checking later and figure out which are the likely problem headers to replace. If you create that header span again in the table that goes across multiple pages you'll be unable to use the Table Editor again until you delete that tag with the page spanning issue.
I haven't found if you can combine TH Row Span settings with IDs/Associated Header Cell IDs and have the user software identify both, so I've been doing the tedious ID association on large but simple tables as my "It's tagged correctly" option, but unfortunately it isn't nearly as fast and easy as Row Spans.

You can edit the tag's object properties by right-clicking on the tag and then you can add an ID there if it doesn't already have one. Be sure each data cell is associated with a header cell. PAC's screen reader preview will also give a good view of the layout to help you get everything associated correctly.

Is there a (creative) way to hide a text field in Indesign if there is no information in the data merge field?

I am creating a data-merge document in InDesign.
There are various tables that I've created which only show as many rows as there is actual data in the field, through some creative table and cell styles.
Now I've been asked to only have an entirely separate table only show if there is information in any of those fields.
I'm at a total loss. With the way the current structure is set up, I can cause it to not display any text, but it still shows empty header cells and one line of empty row cells.
Pre-DataMerge, with the data fields
Post-Datamerge, with the resulting empty cells
Any creative ideas to hide that table? I was thinking there might be a way to hide the entire text field, if not the table. Maybe a script? I tried one that deletes blank tables, but that didn't seem to work after the data-merge was run.

I am not sure you can get that level of processing with InDesign datamerge. You could think of a script to post remove those tables or use a dedicated plugin such as Easycatalog that can take care of such empty items natively.

How do I insert a table into a cell using word vba or a table within a table?

I am new on VBA, I am creating a script to generate a report from a DB, I have been able to assemble a general draft of my report but I need to insert a table into an existing cell inside a word document, I have been surfing around but I am unable either to do a websearch with the correct terms in order to find some guidance on how to achieve this, If I am able to do it with my mouse I am sure I am able to do it through scripting, any resources that would help me in the right direction will be deeply appreciated.

Ok I found a way, The secret is in the Range, to specify where do you want your nested table to be placed.
so, I am using Powershell so the syntax might vary a little
For creating the Table in the Document (assuming you already have a created document, if not you are missing that part which I am not going through since they already are several question/answer pairs on that subject).
$TableX = $oDoc.Tables.Add($oDoc.Bookmarks("TableX").Range, 4, 3)
So TableX is our actual table, then we are telling word to use the helper method on the oDoc (Which is the name of our document object) to add a table, with a Bookmark named TableX (I will again not do a large explanation on this one, just for practical purposes, we name the Bookmark so we can reference to the table by Bookmark name later if we need to add data to it or manipulate it in any way we need) and at the same time we are calling to the method Range which is going to tell the Document Object where do we want the table to be placed, since we have not defined the range explicitly it will insert it in the next available line on the document,
Finally we specify how many rows and columns we would like in the table.
That is what we need for creating a table, now the tricky part, how do we insert a nested table, and moreover how do we specify where do we want this second table to be nested.
Well, with this:
$oNestedTable = $oDoc.Tables.Add($TableX.Cell(4, 2).Range,7,3)
We name our nested table oNestedTable, then we are calling the same helper method we called before to add a new table to the document, but wait, look carefully at the differences, the range part of the command is pointing to a specific cell on in our first table, that is in the fourth row within the second cell, it is there where we are explicitly telling the document to insert a new table with 7 rows and 3 columns.
I hope this gives you some bare minimum guidance.
Regards
en

Best way to handle multi-valued fields as a view/grid

In several notes applications, instead of handling related data as separate documents, if the size of the data is small (less than the 32k limit), I'll make several multi valued fields and display it in what I call a "List Panel". It's a table where each column displays one multi-value field. Since fielda(1) goes with fieldb(1) that goes with fieldc(1) there is a concept of rows. (I did a similar thing in my auditing routine discussed here )
It is always assumed that each field has exactly the same number of elements.
All the multi-value fields are then stored on the single document. This avoids several coding conventions that made my eyes bleed like having date changed, who changed it, new value fields for each field we wanted to audit. Another thing that this kept to a minimum was having to provide multiple fields for the same thing that locked you into a limit. Taxrate1, Taxrate2, Taxrate3, etc...
In my "Listpanel" the first column is a vertical checkbox. (One for each element in my lists) This is so I can select one item to bring up and edit, or select multiple values to delete "rows" or apply some kind of mass change to them.
What would be the best way to handle this under xPages to get this functionality? I tried making a table but am having the devil of a time to get the checkboxes to line up with their corresponding data items.
Views and dojo-grids seem to assume we're using a document for each row.....

This TableWalker may provide what you want http://www-10.lotus.com/ldd/ddwiki.nsf/dx/Tutorial-Introduction-to-XPages-Exercise-23
It was created when XPages was all very new, so it's SSJS rather than Java. But if you're comfortable wiith Java, converting it probably won't be a challenge.

You could use a repeat control to display the values and build a table using the table row tags in the repeat. You would want to calculate the id of the checkbox to be able to take an action on that selected row. The repeat var would be just one of your multi-value fields and you use the index of the repeat to get the value for that row from the other multi-value fields.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas