How to get the underlined text from PDF file? - pdf

everyone!
I try to get some underlined text from PDF file by itext, it seems very difficult for me. I've searched the solution for a long time, and I've learned how to get the text's fontfamily, fontsize and text location. However, no underline.
Looking forward to your help!
Thank you!

It might not be possible with itext, but you can achieve this with pdfbox at some extent
look at this: https://stackoverflow.com/a/40039407/4353762
But beware it might not work in some cases, the library needs to know the font and descriptors of the font. if you throw a pdf with unknown type then the descriptor will return null and the code will simply break with NullPointerException.
If you want to handle NullPointerExceptions manually then you might need to look at underlines and strikeThrough methods of
PDFStyledTextStripper.java

Related

Font Not Displaying Properly in PDF

I am trying to save a pdf from illustrator and I have never had this issue, the font looks fine in illustrator, but when I save the pdf and open the pdf in a pdf viewer the "i" character now has a box beneath the text but the dot of the i stays there.
When viewed in illustrator:
When viewed in a PDF viewer:
I know that when the square shows up it means the font you are trying to use isn't there however the other characters appear fine, it just seems to be the I which is odd. The font passed verification (for reference it is Playfair Display
Does anyone know how to fix this or why this could be occurring? Am I exporting wrong(I've never had this issue before with exporting)?
Thanks in advance!
Update: I solved my question while writing it. The font that was installed was a variable font type (I downloaded it from Google), for some reason it doesn't seem to want to play nicely in a pdf (maybe I'm saving it incorrectly?). I deleted the variable font and installed the static versions of the font and now the issue has gone away.
I don't know too much about variable fonts but it seems like they are maybe a bit finicky?
Hope this can help others!

How to write text under underline in itext7

I want text in pdf under underline. I am using itext7. How to make it?
Will be thankfull if someone point right direction to this problem. Thanks in advance.
I have a word office document. There are underlines which will be filled manually after printing on paper. Under each underlines there is annotation text which describe what needed to write out.
I try to reproduce this word office document in pdf format using itext7. Everything ok except this annotation texts under underlines. I have googled it, but maybe I have used wrong search keys I have not found solution. Now I am not sure if is it possible to do.
This is fairly easy to implement using text rise setting which is a standard setting in PDF. Below is a brief code sample:
Paragraph p = new Paragraph();
p.add("The beginning of the line ");
p.add(new Text(" (fill in your name) ").setTextRise(-10).setUnderline().setFontSize(8));
p.add(" end of the line");
And here is how the output looks like if you add this paragraph to a document:

iTextSharp reverses Arabic text when filling combed text field

I have a problem with iTextSharp which looks like it could be a bug.
I have a combed text field and when using iTextSharp to add Arabic text to it, the Arabic letters initially appear reversed when the field is "highlighted". So 'ف ا د ي' appears 'ي د ا ف'.
The moment I click on the field, the highlight disappears and the text appears in the correct direction.
This happens regardless of the direction and alignment and only happens in combed text fields.
Can anyone offer any solutions to this?
Note: I've added the iText tag as well because I have a hunch that this issue is not specific to iTextSharp only and I hope I can replicate any workarounds or solutions in iTextSharp. Regards,
You can usually fix this by setting GenerateAppearances to false on the form object.
Annotations in a PDF (which form fields are a version of) can have different "states" and for each of these "states" you can specify how you want a renderer to display that state. For instance, a checkbox can either be "checked" or "not checked" which is given, but how to render that actual checkmark isn't. Maybe an "X", maybe a ✓, maybe a ☑ or maybe something totally different. These different states are called their Appearance State.
If you don't set an appearance state for an annotation then you are effectively surrendering control of that state to the PDF renderer and letting it do whatever it wants.
Adobe's renderers (Acrobat and Reader) are the de facto standard for PDF renderers and recent ones are actually really good at "filling in the blanks", especially when it comes to things like RTL and many non-English/Latin things. Other renderers out there, including Google's, Apple's, Microsoft's and even your printer might not be as good at this, however, so you might want to test this.

How to bold a text in PDF?

I'm developing a new function to "my" program. This function is able to write PDF files by the simple way, making a simple text file with some codes of PDF standard.
I'm trying to understand how it works yet, but my first problem is about how to apply bold on some line of my document.
I've already downloaded the PDF REFERENCES GUIDE, but I've not found nothing about it.
Any idea?
PDF is not like HTML where you can apply formatting tags for emphasis. As you've read in the PDF reference, all that you do in PDF is to setup a graphics environment (colours used, fonts used, etc) and then put text on the page.
If you want to have something show in bold, use a font that is bold. If you want to have something show in italic, use a font that is italic.
Older software used dirty tricks to create "bold-alike" text, but the good (and easy) way to do it is to make sure you select the correct font before you start drawing text.

Scraping Text from PDF with underlines and strikethroughs

I have a PDF that contains many underlines and strikethroughs in the text. I would like to be able to convert this PDF to HTML. I have tried many different tools, and all of them will sometimes catch the underlines and strikethroughs as text formatting, and at other times will convert the underlines and strikethroughs to graphics, which is (as far as I can tell) useless to me.
I would really like to know how these programs differentiate between underlines that format text and underlines that are converted to graphics, and how I might be able to access the document and capture everything as text formatting.
I may be taking the wrong approach with this, and am open to any possible solutions, I think I just need to be pointed in the right direction.
Thank you in advance for any assistance.
There are no underlines and strikethroughs in PDF, there are just lines being drawn on top of text.
PDF tools that detect underlines and strikethroughs will usually look for a line drawing that is close enough to the text, or some other similar heuristics, then add the corresponding style to the text output when converting into another format. However this kind of approach will never work for 100% of the cases.