table row is getting started from the new page in itext pdf - pdf

I am using PdfPTable to create a table in pdf.I have a single row in the table.In my row last column has data which has height more than remaining height of the page.So row is getting started from the next page while table headers are on the previous page and there is large blank space below the header on the first page.
Can anybody suggest how can i split the row over multiple page.
Thanks

Please read chapter 4 of my book or browser the documentation that is abundant on the iText site.
By default, table rows aren't split. iText will try to add a complete row to the current page, and if the row doesn't fit, it will try again on the next page. Only if it doesn't fit on the next page, it will split the row. This is the default behavior, so you shouldn't be surprised by what you see in your application.
You can change this default behavior. There's a method that will allow you to drop content that doesn't match (this is not what you want) and there's a method that will allow you to split rows when they don't fit the current page (this is what you want).
The method you need is used in the HeaderFooter2 example:
PdfPTable table = getTable(...);
table.setSplitLate(false);
By default, the value of setSplitLate() is true: iText will split rows as late as possible. By changing this default to false, iText will split rows immediately.

Related

How can I determine whether a table row immediately follows a page break

Given a Word table which spans several pages, how can my code determine that a table row is the first following an automatic page break? Note that table rows are of different heights thus a solution of the form "every 13th row is the first row on a new page" won't work.
The point of this would be to add extra text in the first cell at the top of every new page.
Use wdActiveEndPageNumber for that table row. Be sure that row is as big as it's going to get before checking the page number.
n = word.ActiveDocument.Tables(a).rows(b).Range.Information(wdActiveEndPageNumber)
If the table is sufficiently large that it splits over several pages then you compare the page number of consecutive rows until the page number changes. There are no pagebreaks to speak of.

How do I format/tag an accessible PDF table that spans multiple pages horizontally?

I'm responsible for remediating a PDF that has been generated by a third-party, proprietary system for which I have no access to the layout or design. The goal is to pass the adobe acrobat DC accessibility checker before publication.
Some of the tables in the PDF span multiple pages horizontally (i.e. with a page break at column 4 of 7). Thus far, I have designated each piece of text content as a "Cell" and grouped those into a "Table Row" tag and defined each header and sub-header as a "Table Header Cell".
However, Acrobat DC seems to get confused as to the relative size and spacing of each table element. It is creating phantom column headers and rearranging or combining rows in order to fit the appearance of a more standard layout PER PAGE. But since I need one cohesive table to span TWO PAGES, this is breaking my accessibility.
Depending on how I nest my table elements, I get a table layout like one of the two examples below:
Example when including blank cells for multi-column header rows
Example when defining the column span of multi-colum header rows as "7"
As you can see, the layout is not uniform and does not pass regularity checks. Plus, as I add more rows with several blank cells, the table editor produces an error that reads:
"Unknown Table Structure Encountered"
The only way I have managed to remove this error, is to exclude the bolded main-section sub-headers from the tag structure entirely, but I cannot just leave them as untagged content and pass the checker.
Please help.
Signed up just to comment to
Kevin, thanks for replying. Because of the malformed grid, I cannot even click on the cells on Page 2 in order to associate headers. Is there a way to define table structure without using the Table Editor mode? – Glamador Apr 3 at 12:27
but don't have the rep yet to do so:
Glamador - Knowing it can't help you half a year ago but might in the future: I encountered this in a document this week and figured out the "Why" and how to get the Table Editor back, but not the "Easiest/best way to solve" the tagging in Acrobat. This issue is denying you Table Editor is with the table header (TH) cell you created that spans multiple pages.
So if you set a table header cell to something like Row Span: 7, and 3 of those are on the second page Acrobat will give you the "Unknown table structure encountered. Please retag this table using the Reading Order Tool to possibly fix the problem." error any time you try to use the Table Editor on the table that has that [table header cell with a multi-page row span/I'm not working with but assume column span too].
To get your Table Editor use back (not solving the tagging of accessibility, but to quit getting that error on your table,):
Go to your tags
Create a new empty Table Header Cell
Drag the content displayed in the tag from the problem TH to your new TH
Delete the [multiple page row/column spanning, but now empty] problem TH
Repeat if you did this in multiple TH in the same table
You can now use Table Editor again
Note: Because you can't use the Table Editor once these problem headers have been created you can't use it to see which TH's you have set to span multiple pages, or see those row/column spans, so you're going to have to just look at your document if you went through tagging and are going back and checking later and figure out which are the likely problem headers to replace. If you create that header span again in the table that goes across multiple pages you'll be unable to use the Table Editor again until you delete that tag with the page spanning issue.
I haven't found if you can combine TH Row Span settings with IDs/Associated Header Cell IDs and have the user software identify both, so I've been doing the tedious ID association on large but simple tables as my "It's tagged correctly" option, but unfortunately it isn't nearly as fast and easy as Row Spans.
You can edit the tag's object properties by right-clicking on the tag and then you can add an ID there if it doesn't already have one. Be sure each data cell is associated with a header cell. PAC's screen reader preview will also give a good view of the layout to help you get everything associated correctly.

How to load a excel file which having header in two rows using pentaho

enter image description here
i have an excel file which having the header in two rows( first column header in second row and remaining columns header in 1st n 2nd row) as shown in image.
i have to load this excel into a table using pentaho.
please let me know how to load.
Thanks,
Actually, you define a cell block to read from, either implicitly (top left) or explicitly (giving offsets on the Sheets tab). You can tell Spoon that you want the first row of that block treated as a header row containing fieldnames. This allows you to populate the field list (button Get Fields) at design-time - a convenience feature.
If the names don't suite you, just change them.

VB.Net: Read Table from rtf-File

I have some RTF-Files with a table. Is there a way to get the content of the table into a datatable? Or is there a way to convert the table to csv?
I'll post this as a part answer only, as it is not complete, but can be used to solve the issue that you have.
From the document specified in my comment I found this detail...
Table Definitions
There is no RTF table group; instead, tables are specified as paragraph properties. A table is represented as a sequence of table rows. A table row is a contiguous series of paragraphs partitioned into cells. The table row begins with the \trowd control word and ends with the \row control word. Every paragraph that is contained in a table row must have the \intbl control word specified or inherited from the previous paragraph. A cell may have more than one paragraph in it; the cell is terminated by a cell mark (the \cell control word), and the row is terminated by a row mark (the \row control word). Table rows can also be positioned. In this case, every paragraph in a table row must have the same positioning controls (see the controls on the Positioned Objects and Frames subsection of this Specification. Table properties may be inherited from the previous row; therefore, a series of table rows may be introduced by a single .
You can find this detail from page 93 onward and does seem to provide the bulk of what you need to know.
From this point you should read the file into a string and then search it for each subsequent occurrence of \trowd (allowing for the closing \row command). This should allow the traversal of all tables within the RTF document. Using this method, and by analysing data within the table, you should be able to ascertain what is important to your requirements.

Create multi-page document from a single page template using doc4j

I am planning to use doc4j for search and replace in a template. I'do like to create the page for each member in the list. Basically, I need to replicate the same page from the template. I have done simple search and replace. However, this little complex one for which I need some sample examples. Here is my requirement:
I have a docx template which has the content with place holders.
There is a table with 3 columns in it and I need to replace with different values for each column like first name, last name and etc. The number of rows may vary anywhere from one to 200. So technically this may go beyond one page. If it exceeds more than one page, then I need the table header to repeat in the next page too.
I want to copy the same template on every page and replace the place holder. Basically create a single document with multiple pages each page for one member.
Please provide me with the example.
Appreciate the help.
Thanks.