iTextSharp reverses Arabic text when filling combed text field - pdf

I have a problem with iTextSharp which looks like it could be a bug.
I have a combed text field and when using iTextSharp to add Arabic text to it, the Arabic letters initially appear reversed when the field is "highlighted". So 'ف ا د ي' appears 'ي د ا ف'.
The moment I click on the field, the highlight disappears and the text appears in the correct direction.
This happens regardless of the direction and alignment and only happens in combed text fields.
Can anyone offer any solutions to this?
Note: I've added the iText tag as well because I have a hunch that this issue is not specific to iTextSharp only and I hope I can replicate any workarounds or solutions in iTextSharp. Regards,

You can usually fix this by setting GenerateAppearances to false on the form object.
Annotations in a PDF (which form fields are a version of) can have different "states" and for each of these "states" you can specify how you want a renderer to display that state. For instance, a checkbox can either be "checked" or "not checked" which is given, but how to render that actual checkmark isn't. Maybe an "X", maybe a ✓, maybe a ☑ or maybe something totally different. These different states are called their Appearance State.
If you don't set an appearance state for an annotation then you are effectively surrendering control of that state to the PDF renderer and letting it do whatever it wants.
Adobe's renderers (Acrobat and Reader) are the de facto standard for PDF renderers and recent ones are actually really good at "filling in the blanks", especially when it comes to things like RTL and many non-English/Latin things. Other renderers out there, including Google's, Apple's, Microsoft's and even your printer might not be as good at this, however, so you might want to test this.

Related

How to change a font in a PDF, a single glyph renders wrong (PDF created using Adobe Acrobat Pro XI, with text recognition "clear scan")

I have a document that was created from a scanned document, after using the Acrobat XI pro's text recognition tool, with parameters language: Spanish; PDF output: clear scan; downsample to 600 dpi.
It worked rather well, with only small problems, which can be easily overlooked. Except that I use foxit PDF reader to actually read PDF (I have a slow PC), and there is an "a" glyph that in Adobe looks normal, but in foxit it looks filled, without the empty space at its center (the problem exists only in italics lowercase "a")
(example of problem). There are lots of lower case italics a's, almost in every other page. I use this book to study for a central course for my degree, it's the best we have at our school's library in Spanish, so I read it almost every day, and it's quite annoying (example 2).
There are examples of that italics lowercase "a" that show up fine in foxit the a's in "plantación" are normal.
Sample pages, the first page has normal a's, the second has filled a's
Could I copy the normal looking a glyph and replace the one that causes the problem? if so, what software would I need?
Thanks for reading this.
Yes it is possible to change the ClearScanType (Fd1428390-Identity-H) to conventional font here changed to 11pt Times Roman Italic. Also messed with colour, size and bold to demonstrate effects, but you just need to use one combination.
This change is allowed in the Free version of Tracker PDF-XChange Editor but beware if not done cautiously text edits could trigger demo watermarks.
Select the edit text only from buttons then select text, with properties pane active (on the right) and make changes, if you see the demo banner appear then Ctrl-Z and try a different approach.

Determine the Text that can Display in Multiline PDTextField

Is there a way to determine the text that will actually display in a PDTextField when the PDF prints? If I call setValue and then getValue, it returns all of the text even though it will not all display.
I am trying to fill out a form with a limited size multiline text field that has the notation to attach another page for more details. I would like to limit the text to that which will display and generate the added detail page.
Thanks for indulging a PDFbox newbie.
There is no direct way to find that out as the details of the text layout such as line breaks, padding, line spacing are hidden inside the non public class PlainTextFormatter inside the org.apache.pdfbox.pdmodel.interactive.formpackage. So you'd need to replicate that code.
PDFBox tries to resemble the calculations done by Adobe Acrobat and Adobe Reader but the details of such calculations are not part of the PDF specification. So doing your calculation is only valid for a similar layout model. Other form filling applications might have a slightly different layout model and as a result your results will not apply to these.
In addition to that Acrobat (and PDFBox) place text although it might be partially clipped. Look at the results of the AlignmentTest.javaunit test to see what I mean. So one might have a different expectation to what 'fitting' really means.
As I've thought about passing the information about which text fitted back to the calling application anyway I've opened an enhancement request https://issues.apache.org/jira/browse/PDFBOX-3413 for that.

Highlighting text?

I've been searching for a while to know how Highlighting text (books on PDF, epub, mobi) works programmatically, what kind of code or tech stays behind this feature, yet I couldn't find anything. If you know, please share any hint here.
Thnx in advance
For PDFs you can highlight text by adding a highlight text markup annotation:
12.5.6.10 Text Markup Annotations
Text markup annotations shall appear as highlights, underlines, strikeouts (all PDF 1.3), or jagged (“squiggly”)
underlines (PDF 1.4) in the text of a document. When opened, they shall display a pop-up window containing
the text of the associated note. Table 179 shows the annotation dictionary entries specific to these types of
annotations.
(ISO 32000-1)
You can find details in Table 179. For a specification of annotations in general, read the earlier subsections of section 12.5 Annotations.
Alternatively you can also add the highlighting into the page content. This has the disadvantage that others cannot easily change the highlight (which actually might be an advantage in some use cases). Depending on the PDF viewers to support you might be forced to do this, though.

Possible to control PDF layout with iText?

I'm writing some logic to build a large single PDF file that our users can print at their convenience. I'm using Java's iText library (through Clojure's clj-pdf).
I'm trying to have the PDF show the same exact template form on every single page, however I can't seem to find any documentation or indication that one can have PDF content "fit to a page".
The text in these forms varies a little bit, so there's a chance it might require more of fewer text lines per page. This means that the content has a chance of spilling over to the next page, or being too short, making the next page creep up into the previous one, breaking the requirement of "one form per page" for the rest of the document.
I'm trying to figure out if my option is pretty much only to manually check the length of the text on each page and potentially crop it by hand if I goes over n lines, or if the PDF format somehow supports a smart way of having paragraphs+tables+headings all fit in one page. Some UI systems allow you to control how spill-over is handled, anywhere from cropping to resizing the font, so I'm curious if PDF supports anything of that sort.
Edit: ended up going with pagebreaks for simplicity, wasn't aware of that option when I wrote this question.
If you want to take control over the space taken by text, for instance to fit it on a single page, the way to go would be to create a ColumnText object and to add the content in simulation mode. If the text fits the page, add it for real. If it doesn't, use a smaller font size. This is demonstrated in the MovieAds example where snippets of text are fitted into AcroForm fields.

Modify character spacing in a PDF form field

I'm trying to build a web app to programmatically fill out a PDF form. I am going to configure my form first in Adobe Acrobat, then write a Java app with iText to fill out all the form fields via user input from the web. The base form I need to fill out comes from the US government. They created form fields with extremely large kerning (character spacing) values I need to change. However, there appears to be no way to modify this value in the Acrobat UI.
Does anyone know how to manipulate character spacing on form fields in Acrobat 8.0 for Windows? I could try to use iText to programmatically manipulate the kerning of the original document, but this would be much more tedious.
I believe I figured this out: kerning is called "combing" in acrobat, and each of the form fields have been "combed". The strange thing is this option isn't checked when I view the properties of the form field, but "combing" is the behaviour I was attempting to replicate.