I am trying to convert pdf to csv file. pdf file has data in tabular format with first row as header. I have reached to the level where I can extract text from a cell, compare the baseline of text in table and detect newline but I need to compare table borders to detect start of table. I do not know how to detect and compare lines in PDF. Can anyone help me?
Thanks!!!
As you've seen (hopefully), PDFs have no concept of tables, just text placed at specific locations and lines drawn around them. There is no internal relationship between the text and the lines. This is very important to understand.
Knowing this, if all of the cells have enough padding you can look for gaps between characters that are large enough such as the width of 3 or more spaces. If the cells don't have enough spacing this will unfortunately probably break.
You could also look at every line in the PDF and try to figure out what represents your "table-like" lines. See this answer for how to walk every token on a page to see what's being drawn.
I was also searching the answer for the similar question, but unfortunately I didn't found one so I did it on my own.
A PDF page like this
Will give the output as
Here is the github link for the dotnet Console Application I made.
https://github.com/Justabhi96/Detect_And_Extract_Table_From_Pdf
This application detects the table in the specific page of the PDF and prints them in a table format on the console.
Here is the code that i used to make this application.
First of all I took the text out of PDF along with their coordinates using a class which extends iTextSharp.text.pdf.parser.LocationTextExtractionStrategy class of iTextSharp. The Code is as follows:
This is the Class that is going to store the chunks with there coordinates and text.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
namespace itextPdfTextCoordinates
{
public class RectAndText
{
public iTextSharp.text.Rectangle Rect;
public String Text;
public RectAndText(iTextSharp.text.Rectangle rect, String text)
{
this.Rect = rect;
this.Text = text;
}
}
}
And this is the class that extends the LocationTextExtractionStrategy class.
using iTextSharp.text.pdf.parser;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
namespace itextPdfTextCoordinates
{
public class MyLocationTextExtractionStrategy : LocationTextExtractionStrategy
{
public List<RectAndText> myPoints = new List<RectAndText>();
//Automatically called for each chunk of text in the PDF
public override void RenderText(TextRenderInfo renderInfo)
{
base.RenderText(renderInfo);
//Get the bounding box for the chunk of text
var bottomLeft = renderInfo.GetDescentLine().GetStartPoint();
var topRight = renderInfo.GetAscentLine().GetEndPoint();
//Create a rectangle from it
var rect = new iTextSharp.text.Rectangle(
bottomLeft[Vector.I1],
bottomLeft[Vector.I2],
topRight[Vector.I1],
topRight[Vector.I2]
);
//Add this to our main collection
this.myPoints.Add(new RectAndText(rect, renderInfo.GetText()));
}
}
}
This class is overriding the RenderText method of the LocationTextExtractionStrategy class which will be called each time you extract the chunks from a PDF page using PdfTextExtractor.GetTextFromPage() method.
using itextPdfTextCoordinates;
using iTextSharp.text.pdf;
//Create an instance of our strategy
var t = new MyLocationTextExtractionStrategy();
var path = "F:\\sample-data.pdf";
//Parse page 1 of the document above
using (var r = new PdfReader(path))
{
for (var i = 1; i <= r.NumberOfPages; i++)
{
// Calling this function adds all the chunks with their coordinates to the
// 'myPoints' variable of 'MyLocationTextExtractionStrategy' Class
var ex = iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(r, i, t);
}
}
//Here you can loop over the chunks of PDF
foreach(chunk in t.myPoints){
Console.WriteLine("character {0} is at {1}*{2}",i.Text,i.Rect.Left,i.Rect.Top);
}
Now for Detecting the start and end of the table you can use the coordinates of the chunks extracted from the PDF.
Like if the specific line is not having table then there will be no jumps in the right coordinate of the current chunk and and Left coordinate of next chunk. But the lines having table will be having those coordinate jumps of at least 3 points.
Like for Lines having table will have coordinates of chunks something like this:
right coord of current chunk -> 12.75pts
left coords of next chunk -> 20.30pts
so further you can use this logic to detect tables in the PDF.
The code is as follows:
using itextPdfTextCoordinates;
using iTextSharp.text.pdf;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApp1
{
class LineUsingCoordinates
{
public static List<List<string>> getLineText(string path, int page, float[] coord)
{
//Create an instance of our strategy
var t = new MyLocationTextExtractionStrategy();
//Parse page 1 of the document above
using (var r = new PdfReader(path))
{
// Calling this function adds all the chunks with their coordinates to the
// 'myPoints' variable of 'MyLocationTextExtractionStrategy' Class
var ex = iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(r, page, t);
}
// List of columns in one line
List<string> lineWord = new List<string>();
// temporary list for working around appending the <List<List<string>>
List<string> tempWord;
// List of rows. rows are list of string
List<List<string>> lineText = new List<List<string>>();
// List consisting list of chunks related to each line
List<List<RectAndText>> lineChunksList = new List<List<RectAndText>>();
//List consisting the chunks for whole page;
List<RectAndText> chunksList;
// List consisting the list of Bottom coord of the lines present in the page
List<float> bottomPointList = new List<float>();
//Getting List of Coordinates of Lines in the page no matter it's a table or not
foreach (var i in t.myPoints)
{
Console.WriteLine("character {0} is at {1}*{2}", i.Text, i.Rect.Left, i.Rect.Top);
// If the coords passed to the function is not null then process the part in the
// given coords of the page otherwise process the whole page
if (coord != null)
{
if (i.Rect.Left >= coord[0] &&
i.Rect.Bottom >= coord[1] &&
i.Rect.Right <= coord[2] &&
i.Rect.Top <= coord[3])
{
float bottom = i.Rect.Bottom;
if (bottomPointList.Count == 0)
{
bottomPointList.Add(bottom);
}
else if (Math.Abs(bottomPointList.Last() - bottom) > 3)
{
bottomPointList.Add(bottom);
}
}
}
// else process the whole page
else
{
float bottom = i.Rect.Bottom;
if (bottomPointList.Count == 0)
{
bottomPointList.Add(bottom);
}
else if (Math.Abs(bottomPointList.Last() - bottom) > 3)
{
bottomPointList.Add(bottom);
}
}
}
// Sometimes the above List will be having some elements which are from the same line but are
// having different coordinates due to some characters like " ",".",etc.
// And these coordinates will be having the difference of at most 4 points between
// their bottom coordinates.
//so to remove those elements we create two new lists which we need to remove from the original list
//This list will be having the elements which are having different but a little difference in coordinates
List<float> removeList = new List<float>();
// This list is having the elements which are having the same coordinates
List<float> sameList = new List<float>();
// Here we are adding the elements in those two lists to remove the elements
// from the original list later
for (var i = 0; i < bottomPointList.Count; i++)
{
var basePoint = bottomPointList[i];
for (var j = i+1; j < bottomPointList.Count; j++)
{
var comparePoint = bottomPointList[j];
//here we are getting the elements with same coordinates
if (Math.Abs(comparePoint - basePoint) == 0)
{
sameList.Add(comparePoint);
}
// here ae are getting the elements which are having different but the diference
// of less than 4 points
else if (Math.Abs(comparePoint - basePoint) < 4)
{
removeList.Add(comparePoint);
}
}
}
// Here we are removing the matching elements of remove list from the original list
bottomPointList = bottomPointList.Where(item => !removeList.Contains(item)).ToList();
//Here we are removing the first matching element of same list from the original list
foreach (var r in sameList)
{
bottomPointList.Remove(r);
}
// Here we are getting the characters of the same line in a List 'chunkList'.
foreach (var bottomPoint in bottomPointList)
{
chunksList = new List<RectAndText>();
for (int i = 0; i < t.myPoints.Count; i++)
{
// If the character is having same bottom coord then add it to chunkList
if (bottomPoint == t.myPoints[i].Rect.Bottom)
{
chunksList.Add(t.myPoints[i]);
}
// If character is having a difference of less than 3 in the bottom coord then also
// add it to chunkList because the coord of the next line will differ at least 10 points
// from the coord of current line
else if (Math.Abs(t.myPoints[i].Rect.Bottom - bottomPoint) < 3)
{
chunksList.Add(t.myPoints[i]);
}
}
// Here we are adding the chunkList related to each line
lineChunksList.Add(chunksList);
}
bool sameLine = false;
//Here we are looping through the lines consisting the chunks related to each line
foreach(var linechunk in lineChunksList)
{
var text = "";
// Here we are looping through the chunks of the specific line to put the texts
// that are having a cord jump in their left coordinates.
// because only the line having table will be having the coord jumps in their
// left coord not the line having texts
for (var i = 0; i< linechunk.Count-1; i++)
{
// If the coord is having a jump of less than 3 points then it will be in the same
// column otherwise the next chunk belongs to different column
if (Math.Abs(linechunk[i].Rect.Right - linechunk[i + 1].Rect.Left) < 3)
{
if (i == linechunk.Count - 2)
{
text += linechunk[i].Text + linechunk[i+1].Text ;
}
else
{
text += linechunk[i].Text;
}
}
else
{
if (i == linechunk.Count - 2)
{
// add the text to the column and set the value of next column to ""
text += linechunk[i].Text;
// this is the list of columns in other word its the row
lineWord.Add(text);
text = "";
text += linechunk[i + 1].Text;
lineWord.Add(text);
text = "";
}
else
{
text += linechunk[i].Text;
lineWord.Add(text);
text = "";
}
}
}
if(text.Trim() != "")
{
lineWord.Add(text);
}
// creating a temporary list of strings for the List<List<string>> manipulation
tempWord = new List<string>();
tempWord.AddRange(lineWord);
// "lineText" is the type of List<List<string>>
// this is our list of rows. and rows are List of strings
// here we are adding the row to the list of rows
lineText.Add(tempWord);
lineWord.Clear();
}
return lineText;
}
}
}
You can call getLineText() method of the above class and run the following loop to see the output in the table structure on the console.
var testFile = "F:\\sample-data.pdf";
float[] limitCoordinates = { 52, 671, 357, 728 };//{LowerLeftX,LowerLeftY,UpperRightX,UpperRightY}
// This line gives the lists of rows consisting of one or more columns
//if you pass the third parameter as null the it returns the content for whole page
// but if you pass the coordinates then it returns the content for that coords only
var lineText = LineUsingCoordinates.getLineText(testFile, 1, null);
//var lineText = LineUsingCoordinates.getLineText(testFile, 1, limitCoordinates);
// For detecting the table we are using the fact that the 'lineText' item which length is
// less than two is surely not the part of the table and the item which is having more than
// 2 elements is the part of table
foreach (var row in lineText)
{
if (row.Count > 1)
{
for (var col = 0; col < row.Count; col++)
{
string trimmedValue = row[col].Trim();
if (trimmedValue != "")
{
Console.Write("|" + trimmedValue + "|");
}
}
Console.WriteLine("");
}
}
Console.ReadLine();
Related
I Would like to create a script for photoshop who allow me to search some files with a specific name (for example : 300x250_F1.jpg, 300x250_F2.jpg, 300x600_F1.jpg, etc... ) in differents subfolders (all in the same parent folder) and after load them in my active document. The problem is names of subfolders will be everytime differents.
I definitely need some help :)
i found a code which almost do what i want (thank you).
I'm almost good but i have a problem: if the variable "mask" have only one value, it works. But with few values, it doesn't work anymore.
I think it's because i made an array with the mask variable and i have to update the script...
var topFolder = Folder.selectDialog("");
//var topFolder = new Folder('~/Desktop/PS_TEST');
var fileandfolderAr = scanSubFolders(topFolder, /\.(jpg)$/i);
//var fileandfolderAr = scanSubFolders(topFolder, /\.(jpg|tif|psd|bmp|gif|png|)$/i);
var fileList = fileandfolderAr[0];
var nom = decodeURI(fileList);
//all file paths found and amount of files found
//alert("fileList: " + nom + "\n\nFile Amount: " + fileList.length);
//alert(allFiles);
for (var a = 0; a < fileList.length; a++) {
var docRef = open(fileList[a]);
//do things here
}
function scanSubFolders(tFolder, mask) { // folder object, RegExp or string
var sFolders = [];
var allFiles = [];
var mask = ["300x250_F1", "300x250_F2"];
sFolders[0] = tFolder;
for (var j = 0; j < sFolders.length; j++) { // loop through folders
var procFiles = sFolders[j].getFiles();
for (var i = 0; i < procFiles.length; i++) { // loop through this folder contents
if (procFiles[i] instanceof File) {
if (mask == undefined) {
allFiles.push(procFiles); // if no search mask collect all files
}
if (procFiles[i].fullName.search(mask) != -1) {
allFiles.push(procFiles[i]); // otherwise only those that match mask
}
}
else if (procFiles[i] instanceof Folder) {
sFolders.push(procFiles[i]); // store the subfolder
scanSubFolders(procFiles[i], mask); // search the subfolder
}
}
}
return [allFiles, sFolders];
}
There are several ways you can accomplish this without reassigning mask within the scanSubFolders function.
Solution 1: use a regex
The function is already set up to accept a regex or string as a mask. You just need to use one that would match the pattern of the files you're targeting.
var fileandfolderAr = scanSubFolders(topFolder, /300x250_F(1|2)/gi);
Solution 2: call the function within a loop
If regex isn't your thing, you could still utilize an array of strings, but do it outside the function. Loop the array of masks and call the function with each one, then execute your primary logic on the results of each call.
var topFolder = Folder.selectDialog("");
var myMasks = ["300x250_F1", "300x250_F2"];
for (var index in myMasks) {
var mask = myMasks[index]
var fileandfolderAr = scanSubFolders(topFolder, mask);
var fileList = fileandfolderAr[0];
for (var a = 0; a < fileList.length; a++) {
var docRef = open(fileList[a]);
//do things here
}
}
Don't forget to remove var mask = ["300x250_F1", "300x250_F2"]; from within the scanSubFolders function or else these won't work.
How can you use FileHelpers to read Fixed length file with no line breaks or delimiters? Basically, one to many records on a single line. I have to read in a fixed length record with 13 fields totaling 80 characters. If there are 3 records, for example, that would be a single line with 240 characters in it. The answer can't be go to the source and have them output the file differently. They won't budge. I can abandon FileHelpers, but I like how it works and I first want to see if it's possible before I move on. To answer this, you would have to be firmiliar with FileHelpers.
Here is a simple sample like I explained above with 34 records...
var FileContent = "car 2010Ford Mustang Truck2011Chevy S10 Car 2018Toyota Corola SUV 2017Jeep Wrangler ";
[FixedLengthRecord(FixedMode.ExactLength)]
public class dtoCarRecord
{
[FieldFixedLength(5)]
public string Type;
[FieldFixedLength(4)]
public string Year;
[FieldFixedLength(10)]
public string Make;
[FieldFixedLength(15)]
public string Model;
}
void ApplyDateUpdates(object parameter)
{
var raRecords = new List<dtoRARecord>();
var engine = new FileHelperAsyncEngine<dtoRARecord>();
// Read
using (engine.BeginReadFile((string)parameter))
{
// The engine is IEnumerable
foreach (dtoRARecord detail in engine)
{
// your code here
raRecords.Add(detail);
}
}
}
I'm expecting 4 records out of this file.
Just split your file every 80 characters before importing it.
private static string SplitIntoChunks(string text, int maxWidth)
{
var sb = new StringBuilder(text);
for (int i = 1; i < (sb.Length / maxWidth); i++)
{
int insertPosition = i * maxWidth + i - 1;
sb.Insert(insertPosition, "\n");
}
return sb.ToString();
}
var splitIntoChunks = SplitIntoChunks(FileContent, 34);
using (engine.BeginReadString(splitIntoChunks))
{
// The engine is IEnumerable
foreach (dtoRARecord detail in engine)
{
// your code here
raRecords.Add(detail);
}
}
Assert.AreEqual(4, raRecords.Count());
I have implemented Google's Mobile Vision for Android by following a tutorial. I am trying to build an app that will scan a receipt and find the numeric total. However, as I scan different receipts that are printed in different formats, the API will detect TextBlocks in what seems to be an arbitrary way. For example, in one receipt, if several words of text are separated by single spaces, then they are grouped into a single TextBlock. However, if two words of text are separated by lots of spaces, then they are separated as independent TextBlocks, even though they appear on the same "line". What I am trying to do is force the API to recognize each entire line of the receipt as a single entity. Is this possible?
public ArrayList<T> getAllGraphicsInRow(float rawY) {
synchronized (mLock) {
ArrayList<T> row = new ArrayList<>();
// Get the position of this View so the raw location can be offset relative to the view.
int[] location = new int[2];
this.getLocationOnScreen(location);
for (T graphic : mGraphics) {
float rawX = this.getWidth();
for (int i=0; i<rawX; i+=10){
if (graphic.contains(i - location[0], rawY - location[1])) {
if(!row.contains(graphic)) {
row.add(graphic);
}
}
}
}
return row;
}
}
This should be in the GraphicOverlay.java file and essentially fetches all the graphics in that row.
public static boolean almostEqual(double a, double b, double eps){
return Math.abs(a-b)<(eps);
}
public static boolean pointAlmostEqual(Point a, Point b){
return almostEqual(a.y,b.y,10);
}
public static boolean cornerPointAlmostEqual(Point[] rect1, Point[] rect2){
boolean almostEqual=true;
for (int i=0; i<rect1.length;i++){
if (!pointAlmostEqual(rect1[i],rect2[i])){
almostEqual=false;
}
}
return almostEqual;
}
private boolean onTap(float rawX, float rawY) {
String priceRegex = "(\\d+[,.]\\d\\d)";
ArrayList<OcrGraphic> graphics = mGraphicOverlay.getAllGraphicsInRow(rawY);
OcrGraphic currentGraphics = mGraphicOverlay.getGraphicAtLocation(rawX,rawY);
if (graphics !=null && currentGraphics!=null) {
List<? extends Text> currentComponents = currentGraphics.getTextBlock().getComponents();
final Pattern pattern = Pattern.compile(priceRegex);
final Pattern pattern1 = Pattern.compile(priceRegex);
TextBlock text = null;
Log.i("text results", "This many in the row: " + Integer.toString(graphics.size()));
ArrayList<Text> combinedComponents = new ArrayList<>();
for (OcrGraphic graphic : graphics) {
if (!graphic.equals(currentGraphics)) {
text = graphic.getTextBlock();
Log.i("text results", text.getValue());
combinedComponents.addAll(text.getComponents());
}
}
for (Text currentText : currentComponents) { // goes through components in the row
final Matcher matcher = pattern.matcher(currentText.getValue()); // looks for
Point[] currentPoint = currentText.getCornerPoints();
for (Text otherCurrentText : combinedComponents) {//Looks for other components that are in the same row
final Matcher otherMatcher = pattern1.matcher(otherCurrentText.getValue()); // looks for
Point[] innerCurrentPoint = otherCurrentText.getCornerPoints();
if (cornerPointAlmostEqual(currentPoint, innerCurrentPoint)) {
if (matcher.find()) { // if you click on the price
Log.i("oh yes", "Item: " + otherCurrentText.getValue());
Log.i("oh yes", "Value: " + matcher.group(1));
itemList.add(otherCurrentText.getValue());
priceList.add(Float.valueOf(matcher.group(1)));
}
if (otherMatcher.find()) { // if you click on the item
Log.i("oh yes", "Item: " + currentText.getValue());
Log.i("oh yes", "Value: " + otherMatcher.group(1));
itemList.add(currentText.getValue());
priceList.add(Float.valueOf(otherMatcher.group(1)));
}
Toast toast = Toast.makeText(this, " Text Captured!" , Toast.LENGTH_SHORT);
toast.show();
}
}
}
return true;
}
return false;
}
This should be in OcrCaptureActivity.java and it breaks up the TextBlock into lines and finds the blocks in the same row as the line and checks if the components are all prices, and prints all value accordingly.
The eps value in almostEqual is the tolerance for how tall it checks for graphics in the row.
I need to draw an image on a page that has a specific form field. Using pdfsharp, given a field name, how do I find the pdf page associated with that field?
Here an improvement with corrections which gives also back the pagenum:
PdfPage GetPageFromField(PdfDocument myDocument, string focusFieldName, out int pageNum)
{
// get the field we're looking for
PdfAcroField currentField = (PdfAcroField)(myDocument.AcroForm.Fields[focusFieldName]);
pageNum = 0;
if (currentField != null)
{
// get the page element
var focusPageReference = (PdfReference)currentField.Elements["/P"];
// loop through our pages to match the reference
foreach (var page in myDocument.Pages)
{
pageNum++;
if (page.Reference == focusPageReference)
{
return page;
}
}
}
// could not find a page for this field
return null;
}
You can access the page reference for the field using the page element of the field object. Then use this reference to match the page in the document.
public PdfPage GetPageFromField( PdfDocument myDocument, string focusFieldName )
{
// get the field we're looking for
PdfTextField currentField = (PdfTextField)( fillablePdf.AcroForm.Fields["MyFocusField"]);
if( currentField != null )
{
// get the page element
var focusPageReference = (PdfReference)currentField.Elements["/P"];
// loop through our pages to match the reference
foreach( var page in myDocument.Pages )
{
if( page.Reference = focusPageReference )
{
return page;
}
}
}
// could not find a page for this field
return null;
}
I was wondering if i could get some help converting this into a 3 column (going down to left) per page report.
using System;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;
namespace ConsoleApp
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("-> Creates a PDF file with a block of Text.");
Document document = new Document(PageSize.LETTER);
try
{
PdfWriter writer = PdfWriter.GetInstance(
document,
new FileStream(#"c:\\temp\\column_example.pdf", FileMode.Create));
document.Open();
PdfContentByte cb = writer.DirectContent;
float pos;
PdfPTable table;
PdfPCell cell = new PdfPCell(new Phrase(string.Empty));
Phrase phrase;
float columnWidth = (PageSize.LETTER.Width - 36);
ColumnText ct = GetColumn(cb, columnWidth);
int status = 0;
string line = "Line{0}";
for(int i=0; i<50; i++)
{
table = new PdfPTable(1);
table.SpacingAfter = 9F;
cell = new PdfPCell(new Phrase("Header for table " + i));
table.AddCell(cell);
for (int j = 0; j < (i%2 == 0 ? 5 : 7); j++)
{
phrase = new Phrase(string.Format(line, i));
cell = new PdfPCell(phrase);
table.AddCell(cell);
}
ct.AddElement(table);
pos = ct.YLine;
status = ct.Go(true);
Console.WriteLine("Lines written:" + ct.LinesWritten + " Y-position: " + pos + " - " + ct.YLine);
if (!ColumnText.HasMoreText(status))
{
ct.AddElement(table);
ct.YLine = pos;
ct.Go(false);
}
else
{
document.NewPage();
ct.SetText(null);
ct.AddElement(table);
ct.YLine = PageSize.LETTER.Height - 36;
ct.Go();
}
}
}
catch (DocumentException de)
{
Console.Error.WriteLine(de.Message);
}
catch (IOException ioe)
{
Console.Error.WriteLine(ioe.Message);
}
finally
{
document.Close();
}
Console.ReadLine();
}
private static ColumnText GetColumn(PdfContentByte cb, float columnWidth)
{
var ct = new ColumnText(cb);
ct.SetSimpleColumn(36, 36, columnWidth, PageSize.LETTER.Height - 36, 18, Element.ALIGN_JUSTIFIED);
return ct;
}
}
}
I'm really new with itextsharp and can't find any good examples on how to do this.
Thanks for any help
The easiest way to do that is to put your individual tables into a master 3-column table. Below is code that does that. You'll probably want to adjust margins, widths and borders but this should get you started at least.
Also, since you said you were new to iTextSharp I'm going to assume that you don't have a specific need for using DirectContent. DC is very powerful but most of what you need to do with iTextSharp you can do through specific objects instead. The code below has all DC stuff removed.
//(iText 5.1.1.0)
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using iTextSharp.text;
using iTextSharp.text.pdf;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string filePath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "column_example.pdf");
Console.WriteLine("-> Creates a PDF file with a block of Text.");
Document document = new Document(PageSize.LETTER);
try
{
PdfWriter writer = PdfWriter.GetInstance(document, new FileStream(filePath, FileMode.Create));
document.Open();
//Create a master table with 3 columns
PdfPTable masterTable = new PdfPTable(3);
//Set the column widths, this should probably be adjusted
masterTable.SetWidths(new float[] { 200, 200, 200 });
PdfPTable table;
PdfPCell cell;
Phrase phrase;
string line = "Line{0}";
for (int i = 0; i < 50; i++)
{
table = new PdfPTable(1);
table.SpacingAfter = 9F;
cell = new PdfPCell(new Phrase("Header for table " + i));
table.AddCell(cell);
for (int j = 0; j < (i % 2 == 0 ? 5 : 7); j++)
{
phrase = new Phrase(string.Format(line, i));
cell = new PdfPCell(phrase);
table.AddCell(cell);
}
//Add the sub-table to our master table instead of the writer
masterTable.AddCell(table);
}
//Add the master table to our document
document.Add(masterTable);
}
catch (DocumentException de)
{
Console.Error.WriteLine(de.Message);
}
catch (IOException ioe)
{
Console.Error.WriteLine(ioe.Message);
}
finally
{
document.Close();
}
Console.ReadLine();
}
}
}
EDIT
Sorry, I didn't understand from your original post what you were looking for but do now. Unfortunately you are entering the realm of Math and Mod. I don't have to time (or the brain power this morning) to go through this completely but hopefully I can give you a start.
The entire programming world is based on left-to-right and then top-to-bottom, when you switch it around you tend to have to jump through giant hoops to do what you want (like making an HTML list into 3 columns alphabetized with A's in column 1, B's in column 2, etc.)
In order to do what you want you need to know the heights of the tables so that you can calculate how many vertically that you can get on the page. Unfortunately table height isn't known until render time. The solution (at least for me) is to draw each table to a temporary document which allows us to know the height, then we store the table in an array and throw away the document. Now we've got an array of tables with known heights that we can walk through.
The snippet below does all of this. I changed your row count rule to a random number from 2 to 9 just to get more variety in the sample data. Also, starting with iTextSharp 5.1 (I think that's the right version) many of the "big" objects support IDisposable so I'm using. If you are using an older version you'll need to drop the using and switch to normal variable declaration. Hopefully the comments make sense. You'll see that I pulled out some magic numbers into variables, too.
//Our array of tables
List<PdfPTable> Tables = new List<PdfPTable>();
//Create a random number of rows to get better sample data
int rowCount;
Random r = new Random();
string line = "Line {0}";
PdfPTable table;
//This is the horizontal padding between tables
float hSpace = 5;
//Total number of columns that we want
int columnCount = 3;
//Create a temporary document to write our table to so that their sizes can be calculated
using (Document tempDoc = new Document(PageSize.LETTER))
{
using (MemoryStream tempMS = new MemoryStream())
{
using (PdfWriter tempW = PdfWriter.GetInstance(tempDoc, tempMS))
{
tempDoc.Open();
//Calculate the table width which is the usable space minus the padding between tables divided by the column count
float documentUseableWidth = tempDoc.PageSize.Width - tempDoc.LeftMargin - tempDoc.RightMargin;
float totalTableHPadding = (hSpace * (columnCount - 1));
float tableWidth = (documentUseableWidth - totalTableHPadding) / columnCount;
for (int i = 0; i < 50; i++)
{
table = new PdfPTable(1);
table.AddCell(new PdfPCell(new Phrase("Header for table " + i)));
rowCount = r.Next(2, 10);
for (int j = 0; j < rowCount; j++)
{
table.AddCell(new PdfPCell(new Phrase(string.Format(line, i))));
}
//In order to use WriteSelectedRows you need to set the width of the table
table.SetTotalWidth(new float[] { tableWidth });
//Write the table to our temporary document in order to calculate the height
table.WriteSelectedRows(1, table.Rows.Count, 0, 0, tempW.DirectContent);
//Add the table to our array
Tables.Add(table);
}
tempDoc.Close();
}
}
}
Once you've got your array of tables you can loop through those and draw them using:
Tables[i].WriteSelectedRows(1, Tables[i].Rows.Count, curX, curY, writer.DirectContent);
Where i is your current table index and curX and curY are your current coordinates.
Hopefully this gets you going in the right direction. WriteSelectedRows does a great job of putting a table exactly where you want it.
One last thing to remember, the Y coordinate that it takes starts at the bottom of the document, not the top, so 0 is the bottom and 720 is "above" it and not below.