Can I find bordercolor of a field in PDF using iText? - pdf

Is there anyway of finding the bordercolor of a specific field in my PDF using iText latest version? I could get AcroField.Item, but I dont see an option to get bordercolor from there.

Please take a look at this PDF: text_fields.pdf. This PDF was created using the TextFields example. The following code snippet was used to set the border of the field with name text_2:
text.setBorderStyle(PdfBorderDictionary.STYLE_SOLID);
text.setBorderColor(BaseColor.BLUE);
text.setBorderWidth(2);
Now when we look inside the PDF using iText RUPS, and we take a look at the field dictionary / widget annotation for this field, we see the following structure:
We see a /BS dictionary that defines a solid border style (the value for the /S key is /S) and a border width (/W) with value 2.
We also see that the border color (/BC) entry of the /MK entry is an array with three values: [ 0 0 1 ]. This means that the border color is an RGB color where the value for Red is 0, the value for Green is 0, and the value for Blue is 1. This is consistent with us setting the color to BaseColor.BLUE when we created the file.
You say that you have the AcroField.Item object for a field. Now you need to get the merged field / widget annotation dictionary and follow the path shown by iText RUPS:
AcroFields.Item item = acroFields.getFieldItem(fldName);
PdfDictionary merged = item.getMerged(0);
PdfDictionary mk = merged.getAsDict(PdfName.MK);
PdfArray bc = mk.getAsArray(PdfName.BC);
The values stored in the array bc will inform you about the background color. If the array has only one value, you have a gray color, if there are three, you have an RGB color, if there are four, you have a CMYK color.
Warning: some values may not be present (e.g. there may be no /BC entry). In that case you can get NullPointerExceptions.

Related

How to remove all text color attributes from a QTextDocument?

I've got a QTextDocument read from an HTML file; given a QString of HTML data named topicFileData, I do topicFileTextDocument.setHtml(topicFileData);. I then want to strip off all of the color information, making the whole document just use the default foreground and background brush. (I do not want to explicitly set the text to be black text on a white background; I want to remove the color information from the document.) (Background info: the reason I need to do this is that there are spans within the document that are erroneously set with a black foreground color, rather than just having no color information set, and that causes those spans to display as black-on-black when my app is running in "dark mode", when Qt changes the default text background brush to be black instead of white.)
Here's what I tried:
QTextCursor tc(&topicFileTextDocument);
tc.select(QTextCursor::Document);
QTextCharFormat noColorFormat;
noColorFormat.clearForeground();
noColorFormat.clearBackground();
tc.mergeCharFormat(noColorFormat);
This does not work, unfortunately; it looks like mergeCharFormat() does not understand that I want the clearForeground() and clearBackground() actions to be merged in to strip off those attributes.
I can do tc.setCharFormat(noColorFormat); instead, of course, and that does strip off the color attributes correctly; but it also obliterates all of the other character format info (font, etc.), which is not acceptable.
So, ideally I'd like to find an API that lets me explicitly remove a given text attribute from a QTextDocument. Alternatively, I guess I need to loop through all the spans of the QTextDocument one by one, get the char format of the current span, remove the color attributes from the format, and set the modified format back onto the span. That would be fine; but I have no idea how to loop over spans in that way. Thanks for any help.
Instead of creating a new instance of QTextCharFormat, update the current format and reapply it on the QTextEdit;
default = QTextCharFormat()
charFormat = self.textCursor().charFormat()
charFormat.setBackground(default.background())
charFormat.setForeground(default.foreground())
self.textCursor().mergeCharFormat(charFormat)
A sub-optimal solution that I have found as a workaround is to actually edit the HTML data string before I create the QTextDocument, using a regex:
topicFileData.replace(QRegularExpression("(;? ?color: ?#[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f])"), "");
This works for my situation, because all of the colors in my HTML file are set with color: #XXXXXX style attributes that can be stripped out of the HTML itself. This is fragile, however; colors specified in other ways would not be stripped, and if the body text of the HTML document happened to contain text that matched the regex, the regex would modify it and thus corrupt the content of the document. So I don't recommend this solution, and I won't be accepting it. If somebody can offer a better solution that would be preferable.

Get annotation from a pdf to add to another document

I am using iTextSharp version 5.0.
For my projet, I need to copy my pdf document into another pdf document using pdfWriter. I can't use pdfCopy nor pdfStamper.
So all the annotations get lost during this operation.
To begin, I started to find how to get the annotations of the "pencil comment drawing markup" as shown below on adobe reader UI:
For my tests, I am using this pdf document with a drawing markup I added my self: https://easyupload.io/3c6i1g
I found how to get the annotation dictionary:
Dim pdfReader As New PdfReader(pdfPath)
Dim page As PdfDictionary = pdfReader.GetPageN(0)
Dim annots As PdfArray = page.GetAsArray(PdfName.ANNOTS)
If annots IsNot Nothing Then
For i = 0 To annots.Size - 1
Dim annotDict As PdfDictionary = annots.GetAsDict(i)
Dim annotContents As PdfString = annotDict.GetAsString(PdfName.CONTENT)
Dim annotSubtype As PdfString = annotDict.GetAsString(PdfName.SUBTYPE)
Dim annotName As PdfString = annotDict.GetAsString(PdfName.T)
Next
End If
When the loop is parsing my comment the annotName variable returns my name, so I am sure to parse the annotation I am looking for but the annotSubtype is equal Nothing, how is that possible? According to the pdf specification at section 12.5.2 table 1666 (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf), the subtype parameter is required, so wouldn't it means this should not be at nothing?
Also, how can I get the image related to this annotation? I thought it would be stored in the content of the annotation dictionary but this is also returning nothing in the code above...
about why I can't use pdfStamper at the first place : one of the page of my pdf document must be resized (downscaled) in order to add some text at the bottom of the page, so I must use pdfWriter for that.
Question: How can I get the drawn line of a comment annotation with iTextSharp 5.0?
There are a lot of single questions in your post...
When the loop is parsing my comment the annotName variable returns my name, so I am sure to parse the annotation I am looking for but the annotSubtype is equal Nothing, how is that possible?
According to the pdf specification at section 12.5.2 table 1666 (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf), the subtype parameter is required, so wouldn't it means this should not be at nothing?
According to table 164 in section 12.5.2 of ISO 32000-1, the Subtype entry indeed is required, but it also specified to be a name while you try to retrieve a string instead:
Dim annotSubtype As PdfString = annotDict.GetAsString(PdfName.SUBTYPE)
As the Subtype entry of that annotation in your PDF correctly is a name, GetAsString returns Nothing.
Thus, call GetAsName instead and expect a PdfName return type.
Also, how can I get the image related to this annotation? I thought it would be stored in the content of the annotation dictionary but this is also returning nothing in the code above...
The Contents entry is specified in the same table as above to be optional and (if present) to have a text string value containing a Text that shall be displayed for the annotation or, if this type of annotation does not display text, an alternate description of the annotation’s contents in human-readable form. As the annotation merely is a scribble, what should the annotation have as Contents value?
As your annotation actually is an Ink annotation, you can find the representation of the scribble in the required InkList and optional BS entries of the annotation, see table 182 of section 12.5.6.13 of ISO 32000-1.
The value of InkList is An array of n arrays, each representing a stroked path. Each array shall be a series of alternating horizontal and vertical coordinates in default user space, specifying points along the path. When drawn, the points shall be connected by straight lines or curves in an implementation-dependent way.
The value of the BS (if present) is A border style dictionary (see Table 166) specifying the line width and dash pattern that shall be used in drawing the paths.
Beware, though: The annotation dictionary’s AP entry, if present, takes precedence over the InkList and BS entries. And in your PDF the annotation has an appearance entry. So the actually displayed content is that of the Normal appearance stream which contains vector graphics instructions drawing your scribble.
about why I can't use pdfStamper at the first place : one of the page of my pdf document must be resized (downscaled) in order to add some text at the bottom of the page, so I must use pdfWriter for that.
First of all, this only means that you have to do something special to that special page, there is no need to damage all pages by copying them with a PdfWriter. You could manipulate that single page in a separate document, then use PdfCopy to copy the pages before that page from the original PDF, then that page from the separate PDF, and then all pages after that page from the original again.
Thus, you'd only have to fix the annotations of that special page, the annotations on the other pages could remain untouched.
Furthermore, you can even use the PdfStamper if you are ready to use low level iText routines. In particular before stamping you can apply the static PdfReader method GetPageContent to the page dictionary of the special page to retrieve the page content as byte array, build a new byte array from it in which you prepend an affine transformation which does the downscaling, and set the new byte array as content of the page in question using the SetPageContent method of the underlying PdfReader
Even in this scenario, though, you'd have to adjust the annotation coordinates (both of their rectangles and of other coordinates like the InkList in your case)...
Question: How can I get the drawn line of a comment annotation with iTextSharp 5.0?
See above, the annotation of the scribble is an Ink annotation and the drawn path is specified in the InkList and BS entries of its dictionary and additionally instantiated in its normal appearance stream.

Getting Font Size of NSAttributed string

I'm trying to get the font size of an NSAttributedString, however, I'm having trouble doing this.
How do you get the font size of an NSAttributedString in objective-c?
An attributed string does not have "a font size". The used font is an attribute, which can vary over the String. Look at your Q: There are two different fonts in one paragraph. (One for the usual text and one for the keywords.)
Therefore you can only ask for the existing attributes (including font) at a specific location. I. e. - attributesAtIndex:effectiveRange: does this job for you. The attribute key for the font is NSFontAttributeName. If yo do not find this key in the attributes dictionary, it is Helvetica(Neue), 12 pt.

How should I determine the height of an NSAttributedString for a fixed width

I have an attributed string with "Date: Heading - Description."
The Date, Heading, and Description text values are dynamically generated, and all have different attributed fonts which I've assigned to the appropriate substring.
How should I determine the total width, and then consequentially the determine the height for a given width which this content should fill?
I've found this SO post but it is from 2010 and a more current solution seems like it would be available.
Additionally, how should the width of an attributed string with content changing text style by substring be determined? The iOS7 sizeWithAttributes method seems to apply the attributes for the entire string.
This is an attributed string, so the place to look is the NSAttributedString documentation. In particular:
https://developer.apple.com/library/ios/documentation/uikit/reference/NSAttributedString_UIKit_Additions/Reference/Reference.html#//apple_ref/occ/instm/NSAttributedString/boundingRectWithSize:options:context:
As to your last paragraph: simply make this a mutable attributed string, change the styles as desired, and proceed as above.

How to put term payload pairs to a lucene document

I have a list of terms and associated payloads. How can I put these into a lucene document or rather a field?
Here is my list:
List<MyTerm> list = new List<MyTerm>(){
new MyTerm(){
Text = "apple",
Payload = BitConverter.GetBytes(2)
},
new MyTerm(){
Text = "juice",
Payload = BitConverter.GetBytes(5)
}};
I guess I have to use the following constructor of a field.
Field(string name, TokenStream tokenStream);
But how to build the required tokenStream from my list?
Edit
I want to search by terms. The payloads are needed for custom scoring.
My terms are dominant colors of an image and I want to store the percentage of that color for scoring when searching by colors. If someone searches for red images, images with a lot red in it should be scored higher then images with less red in it.
Edit
I should mention, that one image can have mulitiple dominant colors. Furthermore I want to be able to search for images by multiple dominant colors. For example: I want to retrieve images which have a lot red and a lot blue. Thus I guess putting colors and intensities into different fields is not an option.
OK, based on your explanation I would suggest using 2 Fields - one for the term (dominant color) and one for the payload (intensity of color) - and sort the results on payload. This could look like this:
Field color = new Field("Color", colorString, Field.Store.No, Field.Index.ANALYZED));
Field intensity = new Field("Intensity", intensityString, Field.Store.No, Field.Index.NOT_ANALYZED));
The color field is used for querying, intensity for sorting.
If you want to, store the values in the index. Depends on your further needs.
BTW: Please use the edit function to update your original question with your additional information.