Find rectangle in pdf using iText 7

Find rectangle in pdf using iText 7 - pdf

I've used foxit to create a pdf and then added a shape (rectangle) to the pdf.
I'm using the ILocationExtractionStrategy in order to locate the shape.
using (var pdfReader = new PdfReader(fileName))
{
using (PdfDocument docToSign = new PdfDocument(pdfReader))
{
var sf = new ShapeFilter();
new PdfCanvasProcessor(sf).ProcessPageContent(docToSign.GetFirstPage());
}
}
//ILocationExtractionStrategy implementation
public class ShapeFilter : ILocationExtractionStrategy
{
public void EventOccurred(IEventData data, EventType type)
{
System.Console.WriteLine(type);
}
public ICollection<IPdfTextLocation> GetResultantLocations()
{
return null;
}
public ICollection<EventType> GetSupportedEvents()
{
return null;
}
}
For my simple test document which only contains one rectangle, ILocationExtractionStrategy.EventOccurred is called three times (CLIP_PATH_CHANGED,RENDER_PATH,CLIP_PATH_CHANGED), but i'm unable to find the position of the shape.
I've used RUPS to check what property i should look for, but even there i cant find the rectangle.
The pdf can be downloaded from onedrive: https://1drv.ms/b/s!AkROTDoCWFJnlrZOPiV74h3456Trwg
Any hints how such a rectangle(the position of it) can be read using iText?

Related

How to set variable inside of a Coroutine after yielding a webrequest

Okay I will try and explain this to the best of my ability. I have searched and searched all day for a solution to this issue but can't seem to find it. The problem that I am having is that I have a list of scriptable objects that I am basically using for custom properties to create gameobjects off of. One of those properties that I need to get is a Texture2D that I turn into a sprite. Therefor, I am using UnityWebRequest in a Coroutine and am having to yield the response. After I get the response I am trying to set the variable. However even using Lambdas it seems to me that if I yield return the response before the result it will not set the variable. So every time I check the variable after the Coroutine it comes back null. If someone could enlighten me with what I am missing here that would be just great!
Here is the Scriptable Object Class I am using.
[CreateAssetMenu(fileName = "new movie",menuName = "movie")]
public class MovieTemplate : ScriptableObject
{
public string Title;
public string Description;
public string ImgURL;
public string mainURL;
public string secondaryURL;
public Sprite thumbnail;
}
Here is the call to the Coroutine
foreach (var item in nodes)
{
templates.Add(GetMovieData(item));
}
foreach (MovieTemplate movie in templates)
{
StartCoroutine(GetMovieImage(movie.ImgURL, result =>
{
movie.thumbnail = result;
}));
}
Here is the Coroutine itself
IEnumerator GetMovieImage(string url, System.Action<Sprite> result)
{
using (UnityWebRequest web = UnityWebRequestTexture.GetTexture(url))
{
yield return web.SendWebRequest();
var img = DownloadHandlerTexture.GetContent(web);
result(Sprite.Create(img, new Rect(0, 0, img.width, img.height), Vector2.zero));
}
}

From what you desribe it still seems that the texture is somehow disposed as soon as the routine finishes. My guess would be that it happens due to the using block.
I would store the original texture reference
[CreateAssetMenu(fileName = "new movie",menuName = "movie")]
public class MovieTemplate : ScriptableObject
{
public string Title;
public string Description;
public string ImgURL;
public string mainURL;
public string secondaryURL;
public Sprite thumbnail;
public Texture texture;
public void SetSprite(Sprite newSprite, Texture newTexture)
{
if(texture) Destroy(texture);
texture = newTexture;
var tex = (Texture2D) texture;
thumbnail = Sprite.Create(tex, new Rect(0, 0, tex.width, tex.height), Vector2.zero);
}
}
So you can keep track of the texture itself as well, let it not be collected by the GC but also destroy it when not needed anymore. Usually Texture2D is removed by the GC as soon as there is no reference to it anymore but Texture2D created by UnityWebRequest might behave different.
Than in the webrequest return the texture and don't use using
IEnumerator GetMovieImage(string url, System.Action<Texture> result)
{
UnityWebRequest web = UnityWebRequestTexture.GetTexture(url));
yield return web.SendWebRequest();
if(!web.error)
{
result?.Invoke(DownloadHandlerTexture.GetContent(web));
}
else
{
Debug.LogErrorFormat(this, "Download error: {0} - {1}", web.responseCode, web.error);
}
}
and finally use it like
for (int i = 0; i < templates.Count; i++)
{
int index = i;//If u use i, it will be overriden too so we make a copy of it
StartCoroutine(
GetMovieImage(
templates[index].ImgURL,
result =>
{
templates[index].SetSprite(result);
})
);
}

The problem is with this section of your code :
foreach (MovieTemplate movie in templates)
{
StartCoroutine(GetMovieImage(movie.ImgURL, result =>
{
movie.thumbnail = result;//wrong movie obj
}));
}
Here you will loose refrence to movie object(override by foreach) before the result of callback arrive .
Change it to something like this :
foreach (int i=0;i<templates.Length;i++)
{
int index= i;//If u use i, it will be overriden too so we make a copy of it
StartCoroutine(GetMovieImage(movie.ImgURL, result =>
{
templates[index].thumbnail = result;
}));
}

How to replace / remove text from a PDF file

How would I go about replacing / removing text from a PDF file?
I have a PDF file that I obtained somewhere, and I want to be able to replace some text within it.
Or, I have a PDF file that I want to obscure (redact) some of the text within it so that it's no longer visible [and so that it looks cool, like the CIA files].
Or, I have a PDF that contains global Javascript that I want to stop from interrupting my use of the PDF.

This is possible in a limited fashion with the use of iText / iTextSharp.
It will only work with Tj/TJ opcodes (i.e. standard text, not text embedded in images, or drawn with shapes).
You need to override the default PdfContentStreamProcessor to act on the page content streams, as presented by Mkl here Removing Watermark from PDF iTextSharp. Inherit from this class, and in your new class look for the Tj/TJ opcodes, the operand(s) will generally be the text element(s) (for a TJ this may not be straightforward text, and may require further parsing of all the operands).
A pretty basic example of some of the flexibility around iTextSharp is available from this github repository https://github.com/bevanweiss/PdfEditor (code excerpts below also)
NOTE: This utilises the AGPL version of iTextSharp (and is hence also AGPL), so if you will be distributing executables derived from this code or allowing others to interact with those executables in any way then you must also provide your modified source code. There is also no warranty, implied or expressed, related to this code. Use at your own peril.
PdfContentStreamEditor
using System.Collections.Generic;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
namespace PDFCleaner
{
public class PdfContentStreamEditor : PdfContentStreamProcessor
{
/**
* This method edits the immediate contents of a page, i.e. its content stream.
* It explicitly does not descent into form xobjects, patterns, or annotations.
*/
public void EditPage(PdfStamper pdfStamper, int pageNum)
{
var pdfReader = pdfStamper.Reader;
var page = pdfReader.GetPageN(pageNum);
var pageContentInput = ContentByteUtils.GetContentBytesForPage(pdfReader, pageNum);
page.Remove(PdfName.CONTENTS);
EditContent(pageContentInput, page.GetAsDict(PdfName.RESOURCES), pdfStamper.GetUnderContent(pageNum));
}
/**
* This method processes the content bytes and outputs to the given canvas.
* It explicitly does not descent into form xobjects, patterns, or annotations.
*/
public virtual void EditContent(byte[] contentBytes, PdfDictionary resources, PdfContentByte canvas)
{
this.Canvas = canvas;
ProcessContent(contentBytes, resources);
this.Canvas = null;
}
/**
* This method writes content stream operations to the target canvas. The default
* implementation writes them as they come, so it essentially generates identical
* copies of the original instructions the {#link ContentOperatorWrapper} instances
* forward to it.
*
* Override this method to achieve some fancy editing effect.
*/
protected virtual void Write(PdfContentStreamProcessor processor, PdfLiteral operatorLit, List<PdfObject> operands)
{
var index = 0;
foreach (var pdfObject in operands)
{
pdfObject.ToPdf(null, Canvas.InternalBuffer);
Canvas.InternalBuffer.Append(operands.Count > ++index ? (byte) ' ' : (byte) '\n');
}
}
//
// constructor giving the parent a dummy listener to talk to
//
public PdfContentStreamEditor() : base(new DummyRenderListener())
{
}
//
// constructor giving the parent a dummy listener to talk to
//
public PdfContentStreamEditor(IRenderListener renderListener) : base(renderListener)
{
}
//
// Overrides of PdfContentStreamProcessor methods
//
public override IContentOperator RegisterContentOperator(string operatorString, IContentOperator newOperator)
{
var wrapper = new ContentOperatorWrapper();
wrapper.SetOriginalOperator(newOperator);
var formerOperator = base.RegisterContentOperator(operatorString, wrapper);
return (formerOperator is ContentOperatorWrapper operatorWrapper ? operatorWrapper.GetOriginalOperator() : formerOperator);
}
public override void ProcessContent(byte[] contentBytes, PdfDictionary resources)
{
this.Resources = resources;
base.ProcessContent(contentBytes, resources);
this.Resources = null;
}
//
// members holding the output canvas and the resources
//
protected PdfContentByte Canvas = null;
protected PdfDictionary Resources = null;
//
// A content operator class to wrap all content operators to forward the invocation to the editor
//
class ContentOperatorWrapper : IContentOperator
{
public IContentOperator GetOriginalOperator()
{
return _originalOperator;
}
public void SetOriginalOperator(IContentOperator op)
{
this._originalOperator = op;
}
public void Invoke(PdfContentStreamProcessor processor, PdfLiteral oper, List<PdfObject> operands)
{
if (_originalOperator != null && !"Do".Equals(oper.ToString()))
{
_originalOperator.Invoke(processor, oper, operands);
}
((PdfContentStreamEditor)processor).Write(processor, oper, operands);
}
private IContentOperator _originalOperator = null;
}
//
// A dummy render listener to give to the underlying content stream processor to feed events to
//
class DummyRenderListener : IRenderListener
{
public void BeginTextBlock() { }
public void RenderText(TextRenderInfo renderInfo) { }
public void EndTextBlock() { }
public void RenderImage(ImageRenderInfo renderInfo) { }
}
}
}
TextReplaceStreamEditor
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
namespace PDFCleaner
{
public class TextReplaceStreamEditor : PdfContentStreamEditor
{
public TextReplaceStreamEditor(string MatchPattern, string ReplacePattern)
{
_matchPattern = MatchPattern;
_replacePattern = ReplacePattern;
}
private string _matchPattern;
private string _replacePattern;
protected override void Write(PdfContentStreamProcessor processor, PdfLiteral oper, List<PdfObject> operands)
{
var operatorString = oper.ToString();
if ("Tj".Equals(operatorString) || "TJ".Equals(operatorString))
{
for(var i = 0; i < operands.Count; i++)
{
if(!operands[i].IsString())
continue;
var text = operands[i].ToString();
if(Regex.IsMatch(text, _matchPattern))
{
operands[i] = new PdfString(Regex.Replace(text, _matchPattern, _replacePattern));
}
}
}
base.Write(processor, oper, operands);
}
}
}
TextRedactStreamEditor
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
namespace PDFCleaner
{
public class TextRedactStreamEditor : PdfContentStreamEditor
{
public TextRedactStreamEditor(string MatchPattern) : base(new RedactRenderListener(MatchPattern))
{
_matchPattern = MatchPattern;
}
private string _matchPattern;
protected override void Write(PdfContentStreamProcessor processor, PdfLiteral oper, List<PdfObject> operands)
{
base.Write(processor, oper, operands);
}
public override void EditContent(byte[] contentBytes, PdfDictionary resources, PdfContentByte canvas)
{
((RedactRenderListener)base.RenderListener).SetCanvas(canvas);
base.EditContent(contentBytes, resources, canvas);
}
}
//
// A pretty simple render listener, all we care about it text stuff.
// We listen out for text blocks, look for our text, and then put a
// black box over it.. text 'redacted'
//
class RedactRenderListener : IRenderListener
{
private PdfContentByte _canvas;
private string _matchPattern;
public RedactRenderListener(string MatchPattern)
{
_matchPattern = MatchPattern;
}
public RedactRenderListener(PdfContentByte Canvas, string MatchPattern)
{
_canvas = Canvas;
_matchPattern = MatchPattern;
}
public void SetCanvas(PdfContentByte Canvas)
{
_canvas = Canvas;
}
public void BeginTextBlock() { }
public void RenderText(TextRenderInfo renderInfo)
{
var text = renderInfo.GetText();
var match = Regex.Match(text, _matchPattern);
if(match.Success)
{
var p1 = renderInfo.GetCharacterRenderInfos()[match.Index].GetAscentLine().GetStartPoint();
var p2 = renderInfo.GetCharacterRenderInfos()[match.Index+match.Length].GetAscentLine().GetEndPoint();
var p3 = renderInfo.GetCharacterRenderInfos()[match.Index+match.Length].GetDescentLine().GetEndPoint();
var p4 = renderInfo.GetCharacterRenderInfos()[match.Index].GetDescentLine().GetStartPoint();
_canvas.SaveState();
_canvas.SetColorStroke(BaseColor.BLACK);
_canvas.SetColorFill(BaseColor.BLACK);
_canvas.MoveTo(p1[Vector.I1], p1[Vector.I2]);
_canvas.LineTo(p2[Vector.I1], p2[Vector.I2]);
_canvas.LineTo(p3[Vector.I1], p3[Vector.I2]);
_canvas.LineTo(p4[Vector.I1], p4[Vector.I2]);
_canvas.ClosePathFillStroke();
_canvas.RestoreState();
}
}
public void EndTextBlock() { }
public void RenderImage(ImageRenderInfo renderInfo) { }
}
}
Using them with iTextSharp
var reader = new PdfReader("SRC FILE PATH GOES HERE");
var dstFile = File.Open("DST FILE PATH GOES HERE", FileMode.Create);
pdfStamper = new PdfStamper(reader, output, reader.PdfVersion, false);
// We don't need to auto-rotate, as the PdfContentStreamEditor will already deal with pre-rotated space..
// if we enable this we will inadvertently rotate the content.
pdfStamper.RotateContents = false;
// This is for the Text Replace
var replaceTextProcessor = new TextReplaceStreamEditor(
"TEXT TO REPLACE HERE",
"TEXT TO SUBSTITUTE IN HERE");
for(int i=1; i <= reader.NumberOfPages; i++)
replaceTextProcessor.EditPage(pdfStamper, i);
// This is for the Text Redact
var redactTextProcessor = new TextRedactStreamEditor(
"TEXT TO REDACT HERE");
for(int i=1; i <= reader.NumberOfPages; i++)
redactTextProcessor.EditPage(pdfStamper, i);
// Since our redacting just puts a box over the top, we should secure the document a bit... just to prevent people copying/pasting the text behind the box.. we also prevent text to speech processing of the file, otherwise the 'hidden' text will be spoken
pdfStamper.Writer.SetEncryption(null,
Encoding.UTF8.GetBytes("ownerPassword"),
PdfWriter.AllowDegradedPrinting | PdfWriter.AllowPrinting,
PdfWriter.ENCRYPTION_AES_256);
// hey, lets get rid of Javascript too, because it's annoying
pdfStamper.Javascript = "";
// and then finally we close our files (saving it in the process)
pdfStamper.Close();
reader.Close();

You can use GroupDocs.Redaction (available for .NET) for replacing or removing the text from PDF documents. You can perform the exact phrase, case-sensitive and regular expression redaction (removal) of the text. The following code snippet replaces the word "candy" with "[redacted]" in the loaded PDF document.
C#:
using (Document doc = Redactor.Load("D:\\candy.pdf"))
{
doc.RedactWith(new ExactPhraseRedaction("candy", new ReplacementOptions("[redacted]")));
// Save the document to "*_Redacted.*" file.
doc.Save(new SaveOptions() { AddSuffix = true, RasterizeToPDF = false });
}
Disclosure: I work as Developer Evangelist at GroupDocs.

Filepicker in WP8.1 app did not show any photo

I need to make instrument, that allows users to choose photo from Gallery. After choosing photo, it will be shown to the user in ImageBox.
Problem is, that when user chooses some photo in Gallery, Gallery closes, ImageBox stays empty. Code returns no error. Please help me to find error and solve this problem. Thank you.
Here is a code:
ImagePath = String.Empty
filePicker.SuggestedStartLocation = PickerLocationId.PicturesLibrary
filePicker.ViewMode = PickerViewMode.Thumbnail
' Filter to include a sample subset of file types
filePicker.FileTypeFilter.Clear()
filePicker.FileTypeFilter.Add(".bmp")
filePicker.FileTypeFilter.Add(".png")
filePicker.FileTypeFilter.Add(".jpeg")
filePicker.FileTypeFilter.Add(".jpg")
filePicker.PickSingleFileAndContinue()
Dim BitmapImage = New BitmapImage()
Await BitmapImage.SetSourceAsync(filePicker)
MyPhoto.Source = BitmapImage

If you are using filePicker.PickSingleFileAndContinue() you need to add code into the App_Activated
You will notice that PickSingleFileAndContinue() is a void method will not return the picked file were as PickSingleFileAsync() will return a file.
Code block is in C#,sorry i have only limited knowledge in vb, you can find vb sample below
Similar SO Answer
C#
FileOpenPicker openPicker = new FileOpenPicker();
openPicker.ViewMode = PickerViewMode.Thumbnail;
openPicker.SuggestedStartLocation = PickerLocationId.PicturesLibrary;
openPicker.FileTypeFilter.Add(".jpg");
openPicker.FileTypeFilter.Add(".jpeg");
openPicker.FileTypeFilter.Add(".png");
StorageFile file = await openPicker.PickSingleFileAsync();
As your are using PickSingleFileAndContinue() you need to implement ContinuationManager that were you get the file you picked.
In the App.xaml.cs
public ContinuationManager ContinuationManager { get; private set; }
This will fire when the app get activated
protected async override void OnActivated(IActivatedEventArgs e)
{
base.OnActivated(e);
continuationManager = new ContinuationManager();
Frame rootFrame = CreateRootFrame();
await RestoreStatusAsync(e.PreviousExecutionState);
if(rootFrame.Content == null)
{
rootFrame.Navigate(typeof(MainPage));
}
var continuationEventArgs = e as IContinuationActivatedEventArgs;
if (continuationEventArgs != null)
{
Frame scenarioFrame = MainPage.Current.FindName("ScenarioFrame") as Frame;
if (scenarioFrame != null)
{
// Call ContinuationManager to handle continuation activation
continuationManager.Continue(continuationEventArgs, scenarioFrame);
}
}
Window.Current.Activate();
}
Usage in the page
Assume that one page of your app contains code that calls a FileOpenPicker to pick an existing file. In this class, implement the corresponding interface from the ContinuationManager helper class. When your app uses a FileOpenPicker, the interface to implement is IFileOpenPickerContinuable.
public sealed partial class Scenario1 : Page, IFileOpenPickerContinuable
{
...
//inside this you have this
private void PickAFileButton_Click(object sender, RoutedEventArgs e)
{
...
FileOpenPicker openPicker = new FileOpenPicker();
openPicker.ViewMode = PickerViewMode.Thumbnail;
openPicker.SuggestedStartLocation = PickerLocationId.PicturesLibrary;
openPicker.FileTypeFilter.Add(".jpg");
openPicker.FileTypeFilter.Add(".jpeg");
openPicker.FileTypeFilter.Add(".png");
// Launch file open picker and caller app is suspended
// and may be terminated if required
openPicker.PickSingleFileAndContinue();
}
}
switch (args.Kind)
{
case ActivationKind.PickFileContinuation:
var fileOpenPickerPage = rootFrame.Content as IFileOpenPickerContinuable;
if (fileOpenPickerPage != null)
{
fileOpenPickerPage.ContinueFileOpenPicker(args as FileOpenPickerContinuationEventArgs);
}
break;
...
}
Download msdn sample here
Msdn documentation using FilePicker

Text extraction from table cells

I have a pdf. The pdf contains a table. The table contains many cells (>100). I know the exact position (x,y) and dimension (w,h) of every cell of the table.
I need to extract text from cells using itextsharp. Using PdfReaderContentParser + FilteredTextRenderListener (using a code like this http://itextpdf.com/examples/iia.php?id=279 ) I can extract text but I need to run the whole procedure for each cell. My pdf have many cells and the program needs too much time to run. Is there a way to extract text from a list of "rectangle"? I need to know the text of each rectangle. I'm looking for something like PDFTextStripperByArea by PdfBox (you can define as many regions as you need and the get text using .getTextForRegion("region-name") ).

This option is not immediately included in the iTextSharp distribution but it is easy to realize. In the following I use the iText (Java) class, interface, and method names because I am more at home with Java. They should easily be translatable into iTextSharp (C#) names.
If you use the LocationTextExtractionStrategy, you can can use its a posteriori TextChunkFilter mechanism instead of the a priori FilteredRenderListener mechanism used in the sample you linked to. This mechanism has been introduced in version 5.3.3.
For this you first parse the whole page content using the LocationTextExtractionStrategy without any FilteredRenderListener filtering applied. This makes the strategy object collect TextChunk objects for all PDF text objects on the page containing the associated base line segment.
Then you call the strategy's getResultantText overload with a TextChunkFilter argument (instead of the regular no-argument overload):
public String getResultantText(TextChunkFilter chunkFilter)
You call it with a different TextChunkFilter instance for each table cell. You have to implement this filter interface which is not too difficult as it only defines one method:
public static interface TextChunkFilter
{
/**
* #param textChunk the chunk to check
* #return true if the chunk should be allowed
*/
public boolean accept(TextChunk textChunk);
}
So the accept method of the filter for a given cell must test whether the text chunk in question is inside your cell.
(Instead of separate instances for each cell you can of course also create one instance whose parameters, i.e. cell coordinates, can be changed between getResultantText calls.)
PS: As mentioned by the OP, this TextChunkFilter has not yet been ported to iTextSharp. It should not be hard to do so, though, only one small interface and one method to add to the strategy.
PPS: In a comment sschuberth asked
Do you then still call PdfTextExtractor.getTextFromPage() when using getResultantText(), or does it somehow replace that call? If so, how to you then specify the page to extract to?
Actually PdfTextExtractor.getTextFromPage() internally already uses the no-argument getResultantText() overload:
public static String getTextFromPage(PdfReader reader, int pageNumber, TextExtractionStrategy strategy, Map<String, ContentOperator> additionalContentOperators) throws IOException
{
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
return parser.processContent(pageNumber, strategy, additionalContentOperators).getResultantText();
}
To make use of a TextChunkFilter you could simply build a similar convenience method, e.g.
public static String getTextFromPage(PdfReader reader, int pageNumber, LocationTextExtractionStrategy strategy, Map<String, ContentOperator> additionalContentOperators, TextChunkFilter chunkFilter) throws IOException
{
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
return parser.processContent(pageNumber, strategy, additionalContentOperators).getResultantText(chunkFilter);
}
In the context at hand, though, in which we want to parse the page content only once and apply multiple filters, one for each cell, we might generalize this to:
public static List<String> getTextFromPage(PdfReader reader, int pageNumber, LocationTextExtractionStrategy strategy, Map<String, ContentOperator> additionalContentOperators, Iterable<TextChunkFilter> chunkFilters) throws IOException
{
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
parser.processContent(pageNumber, strategy, additionalContentOperators)
List<String> result = new ArrayList<>();
for (TextChunkFilter chunkFilter : chunkFilters)
{
result.add(strategy).getResultantText(chunkFilter);
}
return result;
}
(You can make this look fancier by using Java 8 collection streaming instead of the old'fashioned for loop.)

Here's my take on how to extract text from a table-like structure in a PDF using itextsharp. It returns a collection of rows and each row contains a collection of interpreted columns. This may work for you on the premise that there is a gap between one column and the next which is greater than the average width of a single character. I also added an option to check for wrapped text within a virtual column. Your mileage may vary.
using (PdfReader pdfReader = new PdfReader(stream))
{
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
TableExtractionStrategy tableExtractionStrategy = new TableExtractionStrategy();
string pageText = PdfTextExtractor.GetTextFromPage(pdfReader, page, tableExtractionStrategy);
var table = tableExtractionStrategy.GetTable();
}
}
public class TableExtractionStrategy : LocationTextExtractionStrategy
{
public float NextCharacterThreshold { get; set; } = 1;
public int NextLineLookAheadDepth { get; set; } = 500;
public bool AccomodateWordWrapping { get; set; } = true;
private List<TableTextChunk> Chunks { get; set; } = new List<TableTextChunk>();
public override void RenderText(TextRenderInfo renderInfo)
{
base.RenderText(renderInfo);
string text = renderInfo.GetText();
Vector bottomLeft = renderInfo.GetDescentLine().GetStartPoint();
Vector topRight = renderInfo.GetAscentLine().GetEndPoint();
Rectangle rectangle = new Rectangle(bottomLeft[Vector.I1], bottomLeft[Vector.I2], topRight[Vector.I1], topRight[Vector.I2]);
Chunks.Add(new TableTextChunk(rectangle, text));
}
public List<List<string>> GetTable()
{
List<List<string>> lines = new List<List<string>>();
List<string> currentLine = new List<string>();
float? previousBottom = null;
float? previousRight = null;
StringBuilder currentString = new StringBuilder();
// iterate through all chunks and evaluate
for (int i = 0; i < Chunks.Count; i++)
{
TableTextChunk chunk = Chunks[i];
// determine if we are processing the same row based on defined space between subsequent chunks
if (previousBottom.HasValue && previousBottom == chunk.Rectangle.Bottom)
{
if (chunk.Rectangle.Left - previousRight > 1)
{
currentLine.Add(currentString.ToString());
currentString.Clear();
}
currentString.Append(chunk.Text);
previousRight = chunk.Rectangle.Right;
}
else
{
// if we are processing a new line let's check to see if this could be word wrapping behavior
bool isNewLine = true;
if (AccomodateWordWrapping)
{
int readAheadDepth = Math.Min(i + NextLineLookAheadDepth, Chunks.Count);
if (previousBottom.HasValue)
for (int j = i; j < readAheadDepth; j++)
{
if (previousBottom == Chunks[j].Rectangle.Bottom)
{
isNewLine = false;
break;
}
}
}
// if the text was not word wrapped let's treat this as a new table row
if (isNewLine)
{
if (currentString.Length > 0)
currentLine.Add(currentString.ToString());
currentString.Clear();
previousBottom = chunk.Rectangle.Bottom;
previousRight = chunk.Rectangle.Right;
currentString.Append(chunk.Text);
if (currentLine.Count > 0)
lines.Add(currentLine);
currentLine = new List<string>();
}
else
{
if (chunk.Rectangle.Left - previousRight > 1)
{
currentLine.Add(currentString.ToString());
currentString.Clear();
}
currentString.Append(chunk.Text);
previousRight = chunk.Rectangle.Right;
}
}
}
return lines;
}
private struct TableTextChunk
{
public Rectangle Rectangle;
public string Text;
public TableTextChunk(Rectangle rect, string text)
{
Rectangle = rect;
Text = text;
}
public override string ToString()
{
return Text + " (" + Rectangle.Left + ", " + Rectangle.Bottom + ")";
}
}
}

Dataexporter pdf header position

I'm using dataexporter to create a pdf of a data table, in my data table the header of the columns is centralized, however the pdf version of the same columns is align to the left. how can I make the columns of the pdf be centralized like the data table.

I use the solution to customize the PDFExporter, it work very well, thank you for your attention. Below is how i've done:
My custom class:
public class CustomPDFExporter extends PDFExporter {
#Override
protected void addColumnFacets(DataTable table, PdfPTable pdfTable, ColumnType columnType) {
for(UIColumn col : table.getColumns()) {
if(!col.isRendered()) {
continue;
}
if(col instanceof DynamicColumn) {
((DynamicColumn) col).applyModel();
}
if(col.isExportable()) {
addHeaderValue(pdfTable, col.getFacet(columnType.facet()), FontFactory.getFont(FontFactory.TIMES, "iso-8859-1", Font.DEFAULTSIZE, Font.BOLD));
}
}
}
protected void addHeaderValue(PdfPTable pdfTable, UIComponent component, Font font) {
String value = component == null ? "" : exportValue(FacesContext.getCurrentInstance(), component);
PdfPCell cell = new PdfPCell(new Paragraph(value, font));
cell.setHorizontalAlignment(Element.ALIGN_CENTER);
pdfTable.addCell(cell);
}
}
bean:
public void exportPDF(DataTable table, String filename) throws IOException {
FacesContext context = FacesContext.getCurrentInstance();
Exporter exporter = new CustomPDFExporter();
exporter.export(context, table, filename, false, false, "iso-8859-1", null, null);
context.responseComplete();
}
In my page I added:
<h:commandLink action="#{boxBean.exportPDF(boxTable, 'relatorio_caixas')}" >
<p:graphicImage value="/resources/img/pdf.png"/>
</h:commandLink>

Well your answer is already on stackoverflow: changing style on generating pdf with Primefaces dataExporter
Also take a look here: http://www.primefaces.org/showcase/ui/exporterProcessor.jsf how to use the exportProcessor of Primefaces.
In short you need to create your own processor to create an custom PDF

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find rectangle in pdf using iText 7 - pdf

Related

How to set variable inside of a Coroutine after yielding a webrequest

How to replace / remove text from a PDF file

Filepicker in WP8.1 app did not show any photo

Text extraction from table cells

Dataexporter pdf header position

Categories

Resources