How to measure the text length using PDFsharp library

How to measure the text length using PDFsharp library - pdf

I have a requirement to measure the text length in a PDF and wrap the line if the length exceeds a certain amount. I am already using PDFsharp library.
I already used the following code to determine the length of the text.
public static Size MeasureString(string s, Font font)
{
SizeF result;
using (var image = new Bitmap(1, 1))
{
using (var g = Graphics.FromImage(image))
{
result = g.MeasureString(s, font);
}
}
return result.ToSize();
}
As I understood I am pretty dependent of the resolution and dpi to convert Height and Width properties of the Size class to millimeter. But according to the PDFsharp's team answer in this post "PDF files are vector files that have no DPI".
So I am a bit confused about the right way to measure the text length using this library.

PDF files have no pixels, PDF files have no DPI.
The standard unit with PDFsharp is points. There are 72 points per inch.
You can have the length of the text in points, mm, cm, inch, ...
You can have the width of the page in points, mm, cm, inch, ...
The XTextFormatter class can do simple line-wrapping for you:
http://www.pdfsharp.net/wiki/TextLayout-sample.ashx
This sample shows how to call MeasureString:
http://www.pdfsharp.net/wiki/Graphics-sample.ashx#Show_how_to_get_text_metric_information_19
Use the correct MeasureString method with an XGraphics object and you will get an XSize object with the text dimensions - no pixels, but mm, cm, inch, point, ...
Use MigraDoc for line-wrapping with sophisticated text formatting.
The Wikipedia article on Points: https://en.wikipedia.org/wiki/Point_(typography)

Related

PDF Formfield font size: Default appearance vs. appearance stream

I extract the font size of a form field to get the information about its size (using iText). This works well for most of the documents however for some I get a font size of 1 because in the appearance the font size is 1. However if I open the PDF in several different viewers the size of this text field is always 8. I thought a form field should be rendered according to its appearance? So why do PDF Viewers use the default appearance and not the font size defined in the appearance stream?
Update: As mentioned by MKL I did forget to consider the text matrix.
I did implement my own RenderListener for the font. Does anyone know how to apply the scaling?
public class PdfStreamFontExtractor implements RenderListener{
#Override
public void usedFont(DocumentFont font, float fontSize) {
this.font=font;
this.fontSize=fontSize;
}
#Override
public void renderText(TextRenderInfo renderInfo) {
//get scaling factor from textToUserSpaceTransformMatrix?
}
...
}

But the effective font size is 8 even in the appearance stream!
Have a look at the whole text object:
BT
8 0 0 8 2 5.55 Tm
/TT1 1 Tf
[...] TJ
ET
The text matrix set at the start scales everything by a factor of 8. Thus, the text drawn thereafter has an effective font size of 8 × 1 = 8.
Admittedly, while you can see scaling text matrices in combination with size 1 Tf instructions in regular contents pretty often, I have not seen that in form field appearances yet. It's pretty uncommon, I'd assume.
Concerning your update...
Does anyone know how to apply the scaling?
#Override
public void renderText(TextRenderInfo renderInfo) {
//get scaling factor from textToUserSpaceTransformMatrix?
}
Well, it depends on how you want to measure the transformed size.
One approach would be to take a vertical vector as long as the font size (as given in the Tf instruction), transform it by textToUserSpaceTransformMatrix, and take the length of the transformation result:
#Override
public void renderText(TextRenderInfo renderInfo) {
scaledfontSize=renderInfo.getTransformedFontSize(unScaledfontSize);
}
public class TextRenderInfo {
...
public float getTransformedFontSize(float fontSize){
return new Vector(0, fontSize, 0).cross(this.textToUserSpaceTransformMatrix).length();
}
...
If the transformation only consists of reflections, rotations and scaling, the result should be as desired. If skewing effects are involved, you might want to project that transformed vector onto the plane perpendicular to the transformed writing direction before taking the length.

How to convert PDF to image with same resolution as original PDF

I am using Aspose.Pdf to convert PDF to Image and latter convert Image back to Pdf. I would like to keep resolution, width and height of the new Pdf same as original Pdf.
using (var newDocument = new Document())
{
using (var originalDocument = new Document("c:\\source.pdf"))
{
foreach (Aspose.Pdf.Page page in originalDocument.Pages)
{
if (page.Rotate != Aspose.Pdf.Rotation.None)
{
using (var ms = new MemoryStream())
{
// Create Jpegdevice with specified attributes
// Width, Height, Resolution, Quality
// Quality [0-100], 100 is Maximum
// Create Resolution object
Resolution resolution = new Resolution(Convert.ToInt32(page.ArtBox.Width), Convert.ToInt32(page.ArtBox.Height));
JpegDevice device = new JpegDevice(resolution, 100);
// Convert a particular page and save the image to stream
device.Process(page, ms);
var newPage = newDocument.Pages.Add();
// Set margins so image will fit, etc.
newPage.PageInfo.Margin.Bottom = 0;
newPage.PageInfo.Margin.Top = 0;
newPage.PageInfo.Margin.Left = 0;
newPage.PageInfo.Margin.Right = 0;
// Create an image object
Aspose.Pdf.Image image1 = new Aspose.Pdf.Image();
// Set the image file stream
image1.ImageStream = ms;
// Add the image into paragraphs collection of the section
newPage.Paragraphs.Add(image1);
// YES, save after every page otherwise we get out of memory exception
newDocument.Save(txtFolder.Text + "\\out.pdf");
}
}
else
{
newDocument.Pages.Add(page);
newDocument.Save(txtFolder.Text + "\\out.pdf");
}
}
}
}
Qustions
1> The original PDF size is 18MB but the above code creates large file of size 163MB. I would like keep the size as it is or least somewhat near to original file size by keeping the same resolution.
2>Resolution class also takes int valueX and int valueY parameters. And JpegDevice class takes int width and int height parameters. However it does not specify unit, weathers its in points, pixels, inches etc?
As per Aspose.Pdf documentation
The conversion from point to pixel depends on an image's DPI (dots per
inch) property. For example, if an image's DPI is 96 (96 pixels for
each inch), and it is 100 points high, its height in pixels is (100 /
72) * 96 = 133.3. The general formula is: pixels = ( points / 72 ) *
DPI.
However that does not answer the question what unit the API is asking?
Original Issue
We get PDF from our clients. Some of the PDF we receive has rotation of 90 or 270 degree. However when you open them in Adobe or any other PDF viewer, the page's orientation is correct (it is not landscape or horizontal). All this mess i have to do becuase Aspose.Pdf cannot keep the same orientation when change the rotation to None. So suggested approach is convert PDF to Image and latter convert Image back to Pdf
https://forum.aspose.com/t/cannot-set-cropbox-on-rotated-page/201659/3
https://forum.aspose.com/t/remove-rotation-from-a-document-and-set-it-upright/29170

iText - PDFAppearence issue

We're using iText to put a text inside a signature placeholder in a PDF. We use a code snippet similar to this to define the Signature Appearence
PdfStamper stp = PdfStamper.createSignature(inputReader, os, '\0', tempFile2, true);
sap = stp.getSignatureAppearance();
sap.setVisibleSignature(placeholder);
sap.setRenderingMode(PdfSignatureAppearance.RenderingMode.DESCRIPTION);
sap.setCertificationLevel(PdfSignatureAppearance.NOT_CERTIFIED);
Calendar cal = Calendar.getInstance();
sap.setSignDate(cal);
sap.setLayer2Text(text+"\n"+cal.getTime().toString());
sap.setReason(text+"\n"+cal.getTime().toString()); `
Everything works fine, but the signature text does not fill all the signature placeholder area as expected by us, but the area filled seems to have an height that is approximately the 70% of the available space.
As a result, sometimes especially if the length of the signature text is quite big, the signature text does not fit in the placeholder and the text is striped away.
Example of filled Signature:
I looked into the PdfSignatureAppearence class and I found this code snippet in the getApperance() method that is responsible of this behaviour and is invoked when
sap.setRenderingMode(PdfSignatureAppearance.RenderingMode.DESCRIPTION);
is being called
else {
dataRect = new Rectangle(
MARGIN,
MARGIN,
rect.getWidth() - MARGIN,
rect.getHeight() * (1 - TOP_SECTION) - MARGIN);
}
I don't get the reason for that, because I expect that the text could use all the available placeholder height, with the proper margin.
Is there any way to bypass this behaviour?
We are using iText 5.4.2, but also newer version contains same code snippet so I expect that the behaviour will be same.

As #JJ. already commented,
TOP_SECTION is connected with acro6layers rendering and the code [determining the datarect in pure DESCRIPTION mode] does not take into account the value of the acro6layer flag.
Unless one wants to fix this in the iText 5 code itself, the easiest way to make one's description use the whole signature space is to construct the layer 2 appearance oneself.
To do so one merely has to retrieve a PdfTemplate from PdfSignatureAppearance.getLayer(2) and fill it as desired after one has called PdfSignatureAppearance.setVisibleSignature. The PdfSignatureAppearance remembers that you already have retrieved the layer 2 and doesn't change it anymore.
For the case at hand we essentially copy the PdfSignatureAppearance.getAppearance code for generating layer 2 in pure DESCRIPTION mode, merely correcting the code determining the datarect:
PdfSignatureAppearance appearance = ...;
[...]
appearance.setVisibleSignature(new Rectangle(36, 748, 144, 780), 1, "sig");
PdfTemplate layer2 = appearance.getLayer(2);
String text = "We're using iText to put a text inside a signature placeholder in a PDF. "
+ "We use a code snippet similar to this to define the Signature Appearence.\n"
+ "Everything works fine, but the signature text does not fill all the signature "
+ "placeholder area as expected by us, but the area filled seems to have an height "
+ "that is approximately the 70% of the available space.\n"
+ "As a result, sometimes especially if the length of the signature text is quite "
+ "big, the signature text does not fit in the placeholder and the text is striped "
+ "away.";
Font font = new Font();
float size = font.getSize();
final float MARGIN = 2;
Rectangle dataRect = new Rectangle(
MARGIN,
MARGIN,
appearance.getRect().getWidth() - MARGIN,
appearance.getRect().getHeight() - MARGIN);
if (size <= 0) {
Rectangle sr = new Rectangle(dataRect.getWidth(), dataRect.getHeight());
size = ColumnText.fitText(font, text, sr, 12, appearance.getRunDirection());
}
ColumnText ct = new ColumnText(layer2);
ct.setRunDirection(appearance.getRunDirection());
ct.setSimpleColumn(new Phrase(text, font), dataRect.getLeft(), dataRect.getBottom(), dataRect.getRight(), dataRect.getTop(), size, Element.ALIGN_LEFT);
ct.go();
(CreateSignature.java test signWithCustomLayer2)
(As description text I used some paragraphs from the question body.)
The result:
By adapting the MARGIN value in the code above, one can even use more are. As that can result in the text touching the border, though, that might not be really beautiful.
As an aside:
if the length of the signature text is quite big, the signature text does not fit in the placeholder and the text is striped away.
If you initialize the size variable above with a non-positive value, the code in the if (size <= 0) block will calculate a font size which allows all of the text to fit into the signature rectangle. This does happen in the code above as new Font() returns a font with a size of UNDEFINED which is a constant -1.

Get font height/weight from TextRenderInfo how?

When I parse an existing PDF using iText(Sharp), I create an object which implements IRenderListener which I pass into PdfReaderContentParser.ProcessContent() and sure enough, my object's RenderText() gets called repeatedly with all the text in the PDF.
The problem is, the TextRenderInfo tells me about the base font (in my case, Helvetica) but I can't tell the height of the font nor its weight (regular vs. bold). Is this a known deficiency of iText(Sharp) or am I missing something?

the TextRenderInfo tells me about the base font (in my case, Helvetica) but I can't tell the height of the font nor its weight (regular vs. bold)
Height
Unfortunately iTextSharp does not provide a public font size method or member in the TextRenderInfo. Some people worked around this by using the distance between its GetAscentLine() and its GetDescentLine().
If you are ready to use Reflection, though, you can do better by exposing and using the private TextRenderInfo member GraphicsState gs, e.g. like in this render listener:
public class LocationTextSizeExtractionStrategy : LocationTextExtractionStrategy
{
//Hold each coordinate
public List<SizeAndTextAndFont> myChunks = new List<SizeAndTextAndFont>();
//Automatically called for each chunk of text in the PDF
public override void RenderText(TextRenderInfo wholeRenderInfo)
{
base.RenderText(wholeRenderInfo);
GraphicsState gs = (GraphicsState) GsField.GetValue(wholeRenderInfo);
myChunks.Add(new SizeAndTextAndFont(gs.FontSize, wholeRenderInfo.GetText(), wholeRenderInfo.GetFont().PostscriptFontName));
}
FieldInfo GsField = typeof(TextRenderInfo).GetField("gs", System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance);
}
//Helper class that stores our rectangle, text, and font
public class SizeAndTextAndFont
{
public float Size;
public String Text;
public String Font;
public SizeAndTextAndFont(float size, String text, String font)
{
this.Size = size;
this.Text = text;
this.Font = font;
}
}
You can extract information with such a render listener like this:
using (var pdfReader = new PdfReader(testFile))
{
// Loop through each page of the document
for (var page = startPage; page < endPage; page++)
{
Console.WriteLine("\n Page {0}", page);
LocationTextSizeExtractionStrategy strategy = new LocationTextSizeExtractionStrategy();
PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
foreach (SizeAndTextAndFont p in strategy.myChunks)
{
Console.WriteLine(string.Format("<{0}> in {2} at {1}", p.Text, p.Size, p.Font));
}
}
}
This produces an output like this:
Page 1
< The Philippine Stock Exchange, Inc> in Helvetica-Bold at 8
< Daily Quotations Report> in Helvetica-Bold at 8
< March 23 , 2015> in Helvetica-Bold at 8
<Name> in Helvetica at 7
<Symbol> in Helvetica at 7
<Bid> in Helvetica at 7
[...]
Considering transformations
The numbers you see in the output as font sizes are the values of the font size property in the PDF graphics state at the time the respective text is drawn.
Due to the flexibility of PDF this may not be font size you eventually see in the output, though, a custom transformation may stretch the output considerably. Some PDF producers even always use a font size of 1 and transformations to stretch the output accordingly.
To get a good value for font sizes in such documents, you can improve the LocationTextSizeExtractionStrategy method RenderText like this:
public override void RenderText(TextRenderInfo wholeRenderInfo)
{
base.RenderText(wholeRenderInfo);
GraphicsState gs = (GraphicsState) GsField.GetValue(wholeRenderInfo);
Matrix textToUserSpaceTransformMatrix = (Matrix) TextToUserSpaceTransformMatrixField.GetValue(wholeRenderInfo);
float transformedFontSize = new Vector(0, gs.FontSize, 0).Cross(textToUserSpaceTransformMatrix).Length;
myChunks.Add(new SizeAndTextAndFont(transformedFontSize, wholeRenderInfo.GetText(), wholeRenderInfo.GetFont().PostscriptFontName));
}
with this additional reflection FieldInfo member.
FieldInfo TextToUserSpaceTransformMatrixField = typeof(TextRenderInfo).GetField("textToUserSpaceTransformMatrix", System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance);
Weight
As you can see in the output above, the name of the font may contain more than the mere font family name but also a weight indicator
< March 23 , 2015> in Helvetica-Bold at 8
In your example, therefore,
the TextRenderInfo tells me about the base font (in my case, Helvetica)
the Helvetica without any decorations would imply a regular weight.
Helvetica is one of the standard 14 fonts which every PDF viewer must provide out-of-the-box: Times-Roman, Helvetica, Courier, Symbol, Times-Bold, Helvetica-Bold, Courier-Bold, ZapfDingbats, Times-Italic, Helvetica-Oblique, Courier-Oblique, Times-BoldItalic, Helvetica-BoldOblique, Courier-BoldOblique. Thus, these names are pretty dependable.
Unfortunately font names in general may be chosen arbitrarily; a bold font may have "Bold" or "Black" or other indicators of boldness in its name or none at all.
One might also try to use the font's FontDescriptor dictionary for which an entry FontWeight is specified. Unfortunately this entry is optional, you cannot count on it being there at all.
Furthermore, a font in a PDF can be artificially bold'ed, cf. this answer:
All these numbers are drawn using the same font, merely adding a rising outline line width.
Thus, I'm afraid there is no dependable way to find the exact font weight, merely a number of heuristics which may or may not return acceptable approximations.

Increase left margin of an existing pdf using iTextSharp [duplicate]

My web application signs PDF documents. I would like to let users download the original PDF document (not signed) but adding an image and the signers in the left margin of the pdf document.
I've seen this idea in another web application, and I would like to do the same. Of course I would like to do it using itext library.
I have attached two images, the original PDF document (not signed) and the modified PDF document.

First this: it is important to change the document before you digitally sign it. Once digitally signed, these changes will break the signature.
I will break up the question in two parts and I'll skip the part about the actual watermarking as this is already explained here: How to watermark PDFs using text or images?
This question is not a duplicate of that question, because of the extra requirement to add an extra margin to the right.
Take a look at the primes.pdf document. This is the source file we are going to use in the AddExtraMargin example with the following result: primes_extra_margin.pdf. As you can see, a half an inch margin was added to the left of each page.
This is how it's done:
public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
int n = reader.getNumberOfPages();
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
// properties
PdfContentByte over;
PdfDictionary pageDict;
PdfArray mediabox;
float llx, lly, ury;
// loop over every page
for (int i = 1; i <= n; i++) {
pageDict = reader.getPageN(i);
mediabox = pageDict.getAsArray(PdfName.MEDIABOX);
llx = mediabox.getAsNumber(0).floatValue();
lly = mediabox.getAsNumber(1).floatValue();
ury = mediabox.getAsNumber(3).floatValue();
mediabox.set(0, new PdfNumber(llx - 36));
over = stamper.getOverContent(i);
over.saveState();
over.setColorFill(new GrayColor(0.5f));
over.rectangle(llx - 36, lly, 36, ury - llx);
over.fill();
over.restoreState();
}
stamper.close();
reader.close();
}
The PdfDictionary we get with the getPageN() method is called the page dictionary. It has plenty of information about a specific page in the PDF. We are only looking at one entry: the /MediaBox. This is only a proof of concept. If you want to write a more robust application, you should also look at the /CropBox and the /Rotate entry. Incidentally, I know that these entries don't exist in primes.pdf, so I am omitting them here.
The media box of a page is an array with four values that represent a rectangle defined by the coordinates of its lower-left and upper-right corner (usually, I refer to them as llx, lly, urx and ury).
In my code sample, I change the value of llx by subtracting 36 user units. If you compare the page size of both PDFs, you'll see that we've added half an inch.
We also use these coordinates to draw a rectangle that covers the extra half inch. Now switch to the other watermark examples to find out how to add text or other content to each page.
Update:
if you need to scale down the existing pages, please read Fix the orientation of a PDF in order to scale it

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to measure the text length using PDFsharp library - pdf

Related

PDF Formfield font size: Default appearance vs. appearance stream

How to convert PDF to image with same resolution as original PDF

iText - PDFAppearence issue

Get font height/weight from TextRenderInfo how?

Increase left margin of an existing pdf using iTextSharp [duplicate]

Categories

Resources