itext pdf with list of image and text below - pdf

Need help to generate a pdf with a list of image and text describing the image under it.
Tried the below, but getting image and text beside each other. Please need help with this. Thanks.
........
PdfPTable table = new PdfPTable(1);
table.setHorizontalAlignment(Element.ALIGN_CENTER);
table.setSplitRows(true);
table.setWidthPercentage(90f);
Paragraph paragraph = new Paragraph();
for (int counter = 0; counter < empSize; counter++) {
String imgPath = ... ".png");
Image img = Image.getInstance(imgPath);
img.scaleAbsolute(110f, 95f);
Paragraph textParagraph = new Paragraph("Test" + counter));
textParagraph.setLeading(Math.max(img.getScaledHeight(), img.getScaledHeight()));
textParagraph.setAlignment(Element.ALIGN_CENTER);
Phrase imageTextCollectionPhase = new Phrase();
Phrase ph = new Phrase();
ph.add(new Chunk(img, 0, 0, true));
ph.add(textParagraph);
imageTextCollectionPhase.add(ph);
paragraph.add(imageTextCollectionPhase);
}
PdfPCell cell = new PdfPCell(paragraph);
table.addCell(cell);
doc.add(table);

I assume that you want to get a result that looks like this:
In your case, you are adding all the content (all the images and all the text) to a single cell. You should add them to separate cells as is done in the MultipleImagesInTable example:
public void createPdf(String dest) throws IOException, DocumentException {
Image img1 = Image.getInstance(IMG1);
Image img2 = Image.getInstance(IMG2);
Image img3 = Image.getInstance(IMG3);
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(dest));
document.open();
PdfPTable table = new PdfPTable(1);
table.setWidthPercentage(20);
table.addCell(img1);
table.addCell("Brazil");
table.addCell(img2);
table.addCell("Dog");
table.addCell(img3);
table.addCell("Fox");
document.add(table);
document.close();
}
You can easily change this proof of concept so that a loop is used. Just make sure you put the addCell() methods inside the loop instead of outside the loop.
You can also explicitly create a PdfPCell and combine the text and the image in the same cell like this:
PdfPCell cell = new PdfPCell();
cell.addElement(img1);
cell.addElement(new Paragraph("Brazil"));
table.addCell(cell);

Related

docx4j how to insert image into table cell

I can't insert image into table cell using docx4j using following code:
WordprocessingMLPackage wordPackage = WordprocessingMLPackage.createPackage(PageSizePaper.A4,true);
ObjectFactory factory=Context.getWmlObjectFactory();Tbl table = factory.createTbl();
Tr tableRow = factory.createTr();
byte[] imageBytes = Base64.getDecoder().decode(t.getBase64Image());
BinaryPartAbstractImage imagePart = BinaryPartAbstractImage.createImagePart(wordPackage, imageBytes);
Inline inline = imagePart.createImageInline("image", "image", 0, 1, false);
P celPar = addInlineImageToParagraph(inline, factory);
Tc tableCell = factory.createTc();
tableCell.getContent().clear();
tableCell.getContent().add(celPar);
tableRow.getContent().add(tableCell);
wordPackage.getMainDocumentPart().addObject(table);
private P addInlineImageToParagraph(Inline inline, ObjectFactory factory) {
P paragraph = factory.createP();
R run = factory.createR();
paragraph.getContent().add(run);
Drawing drawing = factory.createDrawing();
run.getContent().add(drawing);
drawing.getAnchorOrInline().add(inline);
return paragraph;
}
Word has problem displaying image. I realy don't know where's problem
If you looked at a docx resulting from your code, you would see:
<w:tbl></w:tbl>
You are just missing
table.getContent().add(tableRow);
EDIT 24 Sept
You didn't say you until now that you were trying to add your image in a footer!
For this you need to specify that part, so the rel attaches to the footer. So use https://github.com/plutext/docx4j/blob/master/src/main/java/org/docx4j/openpackaging/parts/WordprocessingML/BinaryPartAbstractImage.java#L247 or https://github.com/plutext/docx4j/blob/master/src/main/java/org/docx4j/openpackaging/parts/WordprocessingML/BinaryPartAbstractImage.java#L339 etc ie one of the signatures which contains Part sourcePart

Automatic PDF Rendering

I've read the MigraDoc/PdfSharp documentation, but it feels a bit thin. I want to render out a PDF, but not have to manually specify width and height. I just want it to align right, center, or left (of margins), and handle all the sizing for me.
Public Sub Write()
Dim document As PdfDocument = New PdfDocument()
Dim page As PdfPage = document.AddPage()
Dim gfx As XGraphics = XGraphics.FromPdfPage(page)
gfx.MUH = PdfFontEncoding.Unicode
gfx.MFEH = PdfFontEmbedding.Default
Dim font As XFont = New XFont("Verdana", 13, XFontStyle.Bold)
Dim migraDocument As New Document
Dim sec As Section = migraDocument.AddSection()
Dim quotationHeader As New Paragraph
quotationHeader.AddText("Quotation" & vbNewLine)
quotationHeader.Format.Alignment = ParagraphAlignment.Right
sec.Add(quotationHeader)
Dim dhAddressInfo As New Paragraph
dhAddressInfo.AddText("ADDRESS GOES HERE")
dhAddressInfo.Format.Alignment = ParagraphAlignment.Left
sec.Add(dhAddressInfo)
Dim quotationInfo As New Paragraph
quotationInfo.AddText("QUOTATION INFO AND DATE HERE")
quotationInfo.Format.Alignment = ParagraphAlignment.Right
sec.Add(quotationInfo)
Dim customerBilling As New Paragraph
With Customer
customerBilling.AddText("CUSTOMER BILLING OBJECT PROPERTIES HERE")
End With
customerBilling.Format.Alignment = ParagraphAlignment.Left
sec.Add(customerBilling)
Dim authorInfo As New Paragraph
authorInfo.AddText("AUTHOR INFO HERE")
authorInfo.Format.Alignment = ParagraphAlignment.Right
sec.Add(authorInfo)
Dim pricingTable As New Table
'pricingTable.Format.Alignment = ParagraphAlignment.Center
pricingTable.AddColumn("13cm")
pricingTable.AddColumn("13cm")
Dim headerRow As New Row
headerRow = pricingTable.AddRow()
headerRow.HeadingFormat = True
headerRow.Cells(0).AddParagraph("Description")
headerRow.Cells(1).AddParagraph("Amount")
For i As Integer = 0 To SelectedPrices.Count - 1
Dim row As Row = pricingTable.AddRow()
Dim price As Pricing = SelectedPrices(i)
row.Cells(0).AddParagraph(price.Item)
row.Cells(1).AddParagraph(price.Price * price.Quantity)
Next
Dim totalRow As Row = pricingTable.AddRow()
totalRow.Cells(0).AddParagraph("Total: ")
Dim total As Double = 0
For Each price As Pricing In SelectedPrices
total = total + (price.Price * price.Quantity)
Next
totalRow.Cells(1).AddParagraph(total.ToString)
sec.Add(pricingTable)
Dim docRenderer As DocumentRenderer = New DocumentRenderer(migraDocument)
docRenderer.PrepareDocument()
docRenderer.RenderObject(gfx, XUnit.FromCentimeter(0), XUnit.FromCentimeter(0), "10cm", quotationHeader)
docRenderer.RenderObject(gfx, XUnit.FromCentimeter(0), XUnit.FromCentimeter(2), "10cm", dhAddressInfo)
docRenderer.RenderObject(gfx, XUnit.FromCentimeter(5), XUnit.FromCentimeter(2), "10cm", quotationInfo)
docRenderer.RenderObject(gfx, XUnit.FromCentimeter(0), XUnit.FromCentimeter(6), "10cm", customerBilling)
docRenderer.RenderObject(gfx, XUnit.FromCentimeter(5), XUnit.FromCentimeter(6), "10cm", authorInfo)
docRenderer.RenderObject(gfx, XUnit.FromCentimeter(3), XUnit.FromCentimeter(10), "10cm", pricingTable)
document.Save(Environment.CurrentDirectory & "\test.pdf")
End Sub
Notice at the bottom I'm specifying the X and Y coordinates of each section. I just want to define spacing. Alignment should take care of the rest.
I found a different tutorial that uses PdfDocumentRenderer and shows how to correctly use it. It's not in VB, but quite easily translated. I copied it below in case the link goes dead.
http://www.c-sharpcorner.com/UploadFile/aftab_ku/create-object-model-document-and-renders-them-into-pdf/
public Document CreateDocument()
{
// Create a new MigraDoc document
this.document = new Document();
this.document.Info.Title = "";
this.document.Info.Subject = "";
this.document.Info.Author = "Aftab";
DefineStyles();
CreatePage();
FillContent();
return this.document;
}
Here, CreateDocument() in PDFform.cs creates a new MigraDoc. Take a look at the three functions called for creating style and page and fill the content of the tables.
//
void DefineStyles()
{
// Get the predefined style Normal.
Style style = this.document.Styles["Normal"];
// Because all styles are derived from Normal, the next line changes the
// font of the whole document. Or, more exactly, it changes the font of
// all styles and paragraphs that do not redefine the font.
style.Font.Name = "Verdana";
style = this.document.Styles[StyleNames.Header];
style.ParagraphFormat.AddTabStop("16cm", TabAlignment.Right);
style = this.document.Styles[StyleNames.Footer];
style.ParagraphFormat.AddTabStop("8cm", TabAlignment.Center);
// Create a new style called Table based on style Normal
style = this.document.Styles.AddStyle("Table", "Normal");
style.Font.Name = "Verdana";
style.Font.Name = "Times New Roman";
style.Font.Size = 9;
// Create a new style called Reference based on style Normal
style = this.document.Styles.AddStyle("Reference", "Normal");
style.ParagraphFormat.SpaceBefore = "5mm";
style.ParagraphFormat.SpaceAfter = "5mm";
style.ParagraphFormat.TabStops.AddTabStop("16cm", TabAlignment.Right);
}
DefineStyles() does the job of styling the document:
void CreatePage()
{
// Each MigraDoc document needs at least one section.
Section section = this.document.AddSection();
// Put a logo in the header
Image image= section.AddImage(path);
image.Top = ShapePosition.Top;
image.Left = ShapePosition.Left;
image.WrapFormat.Style = WrapStyle.Through;
// Create footer
Paragraph paragraph = section.Footers.Primary.AddParagraph();
paragraph.AddText("Health And Social Services.");
paragraph.Format.Font.Size = 9;
paragraph.Format.Alignment = ParagraphAlignment.Center;
............
// Create the item table
this.table = section.AddTable();
this.table.Style = "Table";
this.table.Borders.Color = TableBorder;
this.table.Borders.Width = 0.25;
this.table.Borders.Left.Width = 0.5;
this.table.Borders.Right.Width = 0.5;
this.table.Rows.LeftIndent = 0;
// Before you can add a row, you must define the columns
Column column;
foreach (DataColumn col in dt.Columns)
{
column = this.table.AddColumn(Unit.FromCentimeter(3));
column.Format.Alignment = ParagraphAlignment.Center;
}
// Create the header of the table
Row row = table.AddRow();
row.HeadingFormat = true;
row.Format.Alignment = ParagraphAlignment.Center;
row.Format.Font.Bold = true;
row.Shading.Color = TableBlue;
for (int i = 0; i < dt.Columns.Count; i++)
{
row.Cells[i].AddParagraph(dt.Columns[i].ColumnName);
row.Cells[i].Format.Font.Bold = false;
row.Cells[i].Format.Alignment = ParagraphAlignment.Left;
row.Cells[i].VerticalAlignment = VerticalAlignment.Bottom;
}
this.table.SetEdge(0, 0, dt.Columns.Count, 1, Edge.Box,
BorderStyle.Single, 0.75, Color.Empty);
}
Here CreatePage() adds a header, footer, and different sections into the document and then the table is created to display the records. Columns from the datatable are added into the table inside the document and then a header row that contains the column names is added.
column = this.table.AddColumn(Unit.FromCentimeter(3));
//creates a new column and width of the column is passed as a parameter.
Row row = table.AddRow();
//A new header row is created
row.Cells[i].AddParagraph(dt.Columns[i].ColumnName);
//this will add the column name to header of the row.
this.table.SetEdge(0, 0, dt.Columns.Count, 1, Edge.Box,
BorderStyle.Single, 0.75, Color.Empty);
//sets the border of the row
void FillContent()
{
...............
Row row1;
for (int i = 0; i < dt.Rows.Count; i++)
{
row1 = this.table.AddRow();
row1.TopPadding = 1.5;
for (int j = 0; j < dt.Columns.Count; j++)
{
row1.Cells[j].Shading.Color = TableGray;
row1.Cells[j].VerticalAlignment = VerticalAlignment.Center;
row1.Cells[j].Format.Alignment = ParagraphAlignment.Left;
row1.Cells[j].Format.FirstLineIndent = 1;
row1.Cells[j].AddParagraph(dt.Rows[i][j].ToString());
this.table.SetEdge(0, this.table.Rows.Count - 2, dt.Columns.Count, 1,
Edge.Box, BorderStyle.Single, 0.75);
}
}
.............
}
FillContent() fills the rows from the datatable into the table inside the document:
row1.Cells[j].AddParagraph(dt.Rows[i][j].ToString());
//adds the value of column into the table row
The Default.aspx file contains the code for generating the PDF:
using MigraDoc.DocumentObjectModel;
using MigraDoc.Rendering;
using System.Diagnostics;
MigraDoc libraries are used for generating PDF documents, and System.Diagnostics for starting a PDF Viewer:
PDFform pdfForm = new PDFform(GetTable(), Server.MapPath("img2.gif"));
// Create a MigraDoc document
Document document = pdfForm.CreateDocument();
document.UseCmykColor = true;
// Create a renderer for PDF that uses Unicode font encoding
PdfDocumentRenderer pdfRenderer = new PdfDocumentRenderer(true);
// Set the MigraDoc document
pdfRenderer.Document = document;
// Create the PDF document
pdfRenderer.RenderDocument();
// Save the PDF document...
string filename = "PatientsDetail.pdf";
pdfRenderer.Save(filename);
// ...and start a viewer.
Process.Start(filename);
The PdfForm object is created and using it, a new MigraDoc is generated. PdfDocumentRenderer renders the PDF document and then saves it. Process.Start(filename) starts a PDF viewer to open the PDF file created using MigraDoc.

How to tell PdfPTable that PdfPCells need to be showed dynamically according to free space on PdfPTable

I have this PDF document that I made with iText in Java.
The PDF Document contains data that is added via PDFPTable objects.
The 'Problem' is that when I have more data then fits on one PDF page, the data is rendered on the next page, leaving me with empty space on the first page. (See the image 'Problem' side).
I would like to have these empty spaces filled with 'PDFPCell' object, see 'Solution' (these PdfPCell object contain another PdfPTable, the data in this PdfPTable must not be 'continued' on the next page of the pdf when it does not fit).
This is a small example in code:
PdfPTable outerTable = new PdfPTable(1);
outerTable.setHorizontalAlignment(Element.ALIGN_LEFT);
outerTable.setWidthPercentage(100);
int i = 0;
while (i < 5)
{
i++;
PdfPTable innerTable = new PdfPTable(new float[] {0.25f, 0.25f, 0.25f, 0.25f});
innerTable .setHorizontalAlignment(Element.ALIGN_LEFT);
innerTable .setWidthPercentage(100);
PdfPCell cell = new PdfPCell(innerTable);
cell.setPadding(0);
innerTable.addCell(new Phrase("test Data"));
innerTable.addCell(new Phrase("test Data"));
innerTable.addCell(new Phrase("test Data"));
innerTable.addCell(new Phrase("test Data"));
outerTable.addCell(cell);
}
document.add(outertable);
document.close();
Please take a look at the DropTablePart example. In this example, I add 4 tables with 19 rows to a ColumnText object. As soon as a table doesn't fit the page, I drop the remaining content of the ColumnText object (which will automatically drop the rest of the table) and I start a new page where a new table will start.
Dropping the content of the ColumnText object can be done in two different ways:
Either:
ct = new ColumnText(writer.getDirectContent());
Or:
ct.setText(null);
The result looks like this:
As you can see, rows 10-18 are dropped from inner table 3.
This is the full code:
public void createPdf(String dest) throws IOException, DocumentException {
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(dest));
document.open();
Rectangle column = new Rectangle(36, 36, 559, 806);
ColumnText ct = new ColumnText(writer.getDirectContent());
ct.setSimpleColumn(column);
for (int i = 0; i < 4; ) {
PdfPTable table = new PdfPTable(new float[]{0.25f, 0.25f, 0.25f, 0.25f});
table.setHorizontalAlignment(Element.ALIGN_LEFT);
table.setWidthPercentage(100);
PdfPCell cell = new PdfPCell(new Phrase("inner table " + (++i)));
cell.setColspan(4);
table.addCell(cell);
for (int j = 0; j < 18; j++) {
table.addCell(new Phrase("test Data " + (j + 1) + ".1"));
table.addCell(new Phrase("test Data " + (j + 1) + ".1"));
table.addCell(new Phrase("test Data " + (j + 1) + ".1"));
table.addCell(new Phrase("test Data " + (j + 1) + ".1"));
}
ct.addElement(table);
if (ColumnText.hasMoreText(ct.go())) {
document.newPage();
ct = new ColumnText(writer.getDirectContent());
ct.setSimpleColumn(column);
}
}
document.close();
}
I didn't use nested tables, because it is generally a bad idea to use nested tables. It has a negative impact on the performance of your application and it usually results in code that is hard to maintain (the programmers who inherit our application will thank you for not using nested tables).

Pdf Merge issue in itextsharp - PDF looks distorted after Merge

I have a simple scenario where I extract pages from a PDF document (or split the document in two parts, if you will) and merge the parts back to a new document, with an option to add new pages in between.
However, in one particular case the resulting document differs from the original one in that couple of pages (in this case pages 4 and 5) look distorted in comparison to the source document.
How can I circumvent the distortion of the pages? The reproduction code below has been tested with iTextSharp versions 5.5.0.0 and 5.5.6.0 (latest at the moment).
You can find the input-File i used here.
void Main()
{
var pathPrefix = #"C:\temp"; // TODO change
var inputDocPath = #"input.pdf";
var part1 = ExtractPages(Path.Combine(pathPrefix, inputDocPath), 1, 2);
var outputPath1 = Path.Combine(pathPrefix, "part1.pdf");
File.WriteAllBytes(outputPath1, part1);
var part2 = ExtractPages(Path.Combine(pathPrefix, inputDocPath), 3);
var outputPath2 = Path.Combine(pathPrefix, "part2.pdf");
File.WriteAllBytes(outputPath2, part2);
var merged = Merge(new[] {
outputPath1,
outputPath2
});
var mergedPath = Path.Combine(pathPrefix, "output.pdf");
File.WriteAllBytes(mergedPath, merged);
}
//Page sizes:
// input: 8,26x11,68; 8,26x11,69; 8,26x11,69; 8,26x11,69; 8,26x11,69; 8,26x11,68; 8,26x11,68
// output: 8,26x11,68; 8,26x11,69; 8,26x11,69; 8,26x11,69; 8,26x11,69; 8,26x11,68; 8,26x11,68
public static byte[] Merge(string[] documentPaths)
{
byte[] mergedDocument;
using (MemoryStream memoryStream = new MemoryStream())
using (Document document = new Document())
{
PdfSmartCopy pdfSmartCopy = new PdfSmartCopy(document, memoryStream);
document.Open();
foreach (var docPath in documentPaths)
{
PdfReader reader = new PdfReader(docPath);
try
{
reader.ConsolidateNamedDestinations();
var numberOfPages = reader.NumberOfPages;
for (int page = 0; page < numberOfPages;)
{
PdfImportedPage pdfImportedPage = pdfSmartCopy.GetImportedPage(reader, ++page);
pdfSmartCopy.AddPage(pdfImportedPage);
}
}
finally
{
reader.Close();
}
}
document.Close();
mergedDocument = memoryStream.ToArray();
}
return mergedDocument;
}
public static byte[] ExtractPages(string pdfDocument, int startPage, int? endPage = null)
{
var reader = new PdfReader(pdfDocument);
var numberOfPages = reader.NumberOfPages;
var endPageResolved = endPage.HasValue ? endPage.Value : numberOfPages;
if (startPage > numberOfPages || endPageResolved > numberOfPages)
string.Format("Error: page indices ({0}, {1}) out of bounds. Document has {2} pages.",
startPage, endPageResolved, numberOfPages).Dump();
byte[] outputDocument;
using (var doc = new Document()) // NOTE use reader.GetPageSizeWithRotation(startPage) ?
using (var msOut = new MemoryStream())
{
var pdfCopyProvider = new PdfCopy(doc, msOut);
doc.Open();
for (var i = startPage; i <= endPageResolved; i++)
{
var page = pdfCopyProvider.GetImportedPage(reader, i);
pdfCopyProvider.AddPage(page);
}
doc.Close();
reader.Close();
outputDocument = msOut.ToArray();
}
return outputDocument;
}
I could reproduce the issue using your code and your test file with iTextSharp 5.5.6. Actually, though, the images are not merely distorted, they have been replaced by other ones! Inspecting the result PDF internally, one observes:
Originally page 3 through 5 each had their own respective Resource dictionary containing different entries than the ones of each other.
After split up, as pages 1 through 3 of part2.pdf, they still had different Resource dictionaries.
In the final merged result, though, page 3 through 5 all refer to the same Resource dictionary object, a copy of the resources of the original page 3!
(As page 3 contains images with the same names as the images on pages 4 and 5, this results in page 3 images being shown on pages 4 and 5.)
Somehow PdfSmartCopy seems to outsmart itself here, using PdfCopy instead creates the expected result.
I assume PdfSmartCopy falsely considers those source dictionaries identical, probably some hash collision without actual equality check.
It might be of interest to note that an equivalent test using Java and iText, SmartMerging.java, does not show the same issue, its result is as expected.
Thus, this looks like an issue of the iTextSharp port or .Net in general.

PDFBox PDFTextStripperByArea region coordinates

In what dimensions and direction is the Rectangle in the
PDFTextStripperByArea's function addRegion(String regionName, Rectangle2D rect).
In other words, where does the rectangle R start and how big is it (dimensions of the origin values, dimensions of the rectangle) and in what direction does it go (direction of the blue arrows in illustration), if new Rectangle(10,10,100,100) is given as a second parameter?
new Rectangle(10,10,100,100)
means that the rectangle will have its upper-left corner at position (10, 10), so 10 units far from the left and the top of the PDF document. Here a "unit" is 1 pt = 1/72 inch.
The first 100 represents the width of the rectangle and the second one its height.
To sum up, the right picture is the first one.
I wrote this code to extract some areas of a page given as arguments to the function:
Rectangle2D region = new Rectangle2D.Double(x, y, width, height);
String regionName = "region";
PDFTextStripperByArea stripper;
stripper = new PDFTextStripperByArea();
stripper.addRegion(regionName, region);
stripper.extractRegions(page);
So, x and y are the absolute coordinates of the upper-left corner of the Rectangle and then you specify its width and height. page is a PDPage variable given as argument to this function.
Was looking into doing something like this, so I thought I'd pass what I found along.
Here's the code for creating my original pdf using itext.
import com.lowagie.text.Document
import com.lowagie.text.Paragraph
import com.lowagie.text.pdf.PdfWriter
class SimplePdfCreator {
void createFrom(String path) {
Document d = new Document()
try {
PdfWriter writer = PdfWriter.getInstance(d, new FileOutputStream(path))
d.open()
d.add(new Paragraph("This is a test."))
d.close()
} catch (Exception e) {
e.printStackTrace()
}
}
}
If you crack open the pdf, you'll see the text in the upper left hand corner. Here's the test showing what you are looking for.
#Test
void createFrom_using_pdf_box_to_extract_text_targeted_extraction() {
new SimplePdfCreator().createFrom("myFileLocation")
def doc = PDDocument.load("myFileLocation")
Rectangle2D.Double d = new Rectangle2D.Double(0, 0, 120, 100)
def stripper = new PDFTextStripperByArea()
def pages = doc.getDocumentCatalog().allPages
stripper.addRegion("myRegion", d)
stripper.extractRegions(pages[0])
assert stripper.getTextForRegion("myRegion").contains("This is a test.")
}
Position (0, 0) is the upper left hand corner of the document. The width and height are heading down and to the right. I was able to trim down the range a bit to (35, 52, 120, 3) and still get the test to pass.
All code is written in groovy.
Code in java using PDFBox.
public String fetchTextByRegion(String path, String filename, int pageNumber) throws IOException {
File file = new File(path + filename);
PDDocument document = PDDocument.load(file);
//Rectangle2D region = new Rectangle2D.Double(x,y,width,height);
Rectangle2D region = new Rectangle2D.Double(0, 100, 550, 700);
String regionName = "region";
PDFTextStripperByArea stripper;
PDPage page = document.getPage(pageNumber + 1);
stripper = new PDFTextStripperByArea();
stripper.addRegion(regionName, region);
stripper.extractRegions(page);
String text = stripper.getTextForRegion(regionName);
return text;
}