I am initialising an NSAttributedString with RTF and it works, in that I can see the text and attributes like font, size and weight are preserved.
NSMutableAttributedString* attributedRTFString = [NSMutableAttributedString.alloc initWithRTF:rtfContent documentAttributes:nil];
Some attributes, notably colour, are lost though. If I look at the RTF code I can see that those attributes are there (for example)…
{\*\htmltag72 </p>}
{\*\htmltag64 <p class=MsoNormal>}\htmlrtf {\htmlrtf0
{\*\htmltag84 <b>}\htmlrtf {\b \htmlrtf0
{\*\htmltag148 <span style='font-size:10.0pt;color:#5F5F5F'>}\htmlrtf {\htmlrtf0
{\*\htmltag156 </span>}\htmlrtf }\htmlrtf0
{\*\htmltag92 </b>}\htmlrtf }\htmlrtf0
{\*\htmltag148 <span style='color:#1F497D'>}\htmlrtf {\htmlrtf0
{\*\htmltag244 <o:p>}
{\*\htmltag252 </o:p>}
{\*\htmltag156 </span>}\htmlrtf }\htmlrtf0 \htmlrtf\par}\htmlrtf0
\htmlrtf \par
\htmlrtf0
And I can see the colours in the original document.
How should I be opening RTF to preserve the formatting fully?
Related
I'm trying to extract the format of a word document containing text in different fonts and font-sizes, images, comments etc. I have used zipfile module to extract the XML files of the word document.
XML files are:
['[Content_Types].xml',
'_rels/.rels',
'word/_rels/document.xml.rels',
'word/document.xml',
'word/footer2.xml',
'word/header1.xml',
'word/footer1.xml',
'word/endnotes.xml',
'word/footnotes.xml',
'word/_rels/header1.xml.rels',
'word/header2.xml',
'word/_rels/header2.xml.rels',
'word/embeddings/Microsoft_Word_97_-_2003_Document1.doc',
'word/media/image3.wmf',
'word/media/image2.emf',
'word/theme/theme1.xml',
'word/media/image1.png',
'word/embeddings/oleObject1.bin',
'word/comments.xml',
'word/settings.xml',
'word/styles.xml',
'customXml/itemProps1.xml',
'word/numbering.xml',
'customXml/_rels/item1.xml.rels',
'customXml/item1.xml',
'docProps/app.xml',
'word/stylesWithEffects.xml',
'word/webSettings.xml',
'word/fontTable.xml',
'docProps/core.xml',
'docProps/custom.xml']
I'm unable to understand the styles associated with the content present in word/document.xml.
I'm trying to encapsulate the results in the following manner:
{
"text": "some-text-in-document",
"font": "some-font",
"font_size": 10,
"some_field": "some-more-value",
...
}
Tried using python-docx to get the fonts and font-sizes but mostly the value is None
here's the code snippet:
from docx.enum.style import WD_STYLE_TYPE
styles = document.styles
#print(styles.default)
paragraph_styles = [s for s in styles if s.type == WD_STYLE_TYPE.PARAGRAPH]
for style in paragraph_styles:
#print(style.font.name)
if(style.font.name):
print(style.font.name, style.font.size)
for paragraph in document.paragraphs:
#print(paragraph.text)
for run in paragraph.runs:
print(run.text)
font = run.style.font
print(font.size)
Results are mostly None for font and size.
A value of None for style means Normal.
All paragraphs have a style, it's just that most have the same style, so Word doesn't spell it out for that majority case, perhaps to save space.
We're using iText to put a text inside a signature placeholder in a PDF. We use a code snippet similar to this to define the Signature Appearence
PdfStamper stp = PdfStamper.createSignature(inputReader, os, '\0', tempFile2, true);
sap = stp.getSignatureAppearance();
sap.setVisibleSignature(placeholder);
sap.setRenderingMode(PdfSignatureAppearance.RenderingMode.DESCRIPTION);
sap.setCertificationLevel(PdfSignatureAppearance.NOT_CERTIFIED);
Calendar cal = Calendar.getInstance();
sap.setSignDate(cal);
sap.setLayer2Text(text+"\n"+cal.getTime().toString());
sap.setReason(text+"\n"+cal.getTime().toString()); `
Everything works fine, but the signature text does not fill all the signature placeholder area as expected by us, but the area filled seems to have an height that is approximately the 70% of the available space.
As a result, sometimes especially if the length of the signature text is quite big, the signature text does not fit in the placeholder and the text is striped away.
Example of filled Signature:
I looked into the PdfSignatureAppearence class and I found this code snippet in the getApperance() method that is responsible of this behaviour and is invoked when
sap.setRenderingMode(PdfSignatureAppearance.RenderingMode.DESCRIPTION);
is being called
else {
dataRect = new Rectangle(
MARGIN,
MARGIN,
rect.getWidth() - MARGIN,
rect.getHeight() * (1 - TOP_SECTION) - MARGIN);
}
I don't get the reason for that, because I expect that the text could use all the available placeholder height, with the proper margin.
Is there any way to bypass this behaviour?
We are using iText 5.4.2, but also newer version contains same code snippet so I expect that the behaviour will be same.
As #JJ. already commented,
TOP_SECTION is connected with acro6layers rendering and the code [determining the datarect in pure DESCRIPTION mode] does not take into account the value of the acro6layer flag.
Unless one wants to fix this in the iText 5 code itself, the easiest way to make one's description use the whole signature space is to construct the layer 2 appearance oneself.
To do so one merely has to retrieve a PdfTemplate from PdfSignatureAppearance.getLayer(2) and fill it as desired after one has called PdfSignatureAppearance.setVisibleSignature. The PdfSignatureAppearance remembers that you already have retrieved the layer 2 and doesn't change it anymore.
For the case at hand we essentially copy the PdfSignatureAppearance.getAppearance code for generating layer 2 in pure DESCRIPTION mode, merely correcting the code determining the datarect:
PdfSignatureAppearance appearance = ...;
[...]
appearance.setVisibleSignature(new Rectangle(36, 748, 144, 780), 1, "sig");
PdfTemplate layer2 = appearance.getLayer(2);
String text = "We're using iText to put a text inside a signature placeholder in a PDF. "
+ "We use a code snippet similar to this to define the Signature Appearence.\n"
+ "Everything works fine, but the signature text does not fill all the signature "
+ "placeholder area as expected by us, but the area filled seems to have an height "
+ "that is approximately the 70% of the available space.\n"
+ "As a result, sometimes especially if the length of the signature text is quite "
+ "big, the signature text does not fit in the placeholder and the text is striped "
+ "away.";
Font font = new Font();
float size = font.getSize();
final float MARGIN = 2;
Rectangle dataRect = new Rectangle(
MARGIN,
MARGIN,
appearance.getRect().getWidth() - MARGIN,
appearance.getRect().getHeight() - MARGIN);
if (size <= 0) {
Rectangle sr = new Rectangle(dataRect.getWidth(), dataRect.getHeight());
size = ColumnText.fitText(font, text, sr, 12, appearance.getRunDirection());
}
ColumnText ct = new ColumnText(layer2);
ct.setRunDirection(appearance.getRunDirection());
ct.setSimpleColumn(new Phrase(text, font), dataRect.getLeft(), dataRect.getBottom(), dataRect.getRight(), dataRect.getTop(), size, Element.ALIGN_LEFT);
ct.go();
(CreateSignature.java test signWithCustomLayer2)
(As description text I used some paragraphs from the question body.)
The result:
By adapting the MARGIN value in the code above, one can even use more are. As that can result in the text touching the border, though, that might not be really beautiful.
As an aside:
if the length of the signature text is quite big, the signature text does not fit in the placeholder and the text is striped away.
If you initialize the size variable above with a non-positive value, the code in the if (size <= 0) block will calculate a font size which allows all of the text to fit into the signature rectangle. This does happen in the code above as new Font() returns a font with a size of UNDEFINED which is a constant -1.
I have a pdf with a grey background on which there are few PDFields. I have set the value for these fields using:
PDAcroForm form = catalog.getAcroForm();
if(null != form){
//Field on top of grey background
PDField idField = form.getField("id");
if (null != idField) {
idField.setValue(value);
}
//Other fields outside grey background
}
Everything works as expected. However, the form fields are not visible when opening in ios' default PDF reader. So I'm trying to flatten the pdf.
Now, when I try to flatten the pdf using form.flatten(), the text of the PDFields on top of the grey background become white (it was black before flattening).
And just for experimenting, I tried flattening only the other fields in the PDF (and skip flattening idField) using form.flatten(fieldsToFlatten, false). When I do this, the fields on the grey background completely disappear.
Why is the font color changing when I try to flatten the pdf? Is there anyway to retain the original font color?
Thanks.
I have a PDF where I add some TextFields.
var txtFld = new TextField(stamper.Writer, new Rectangle(cRightX - cWidthX, cTopY3, cRightX, cTopY), FieldNameProtocol) { Font = bf, FontSize = cHeaderFontSize, Alignment = Element.ALIGN_RIGHT, Options = PdfFormField.FF_MULTILINE };
stamper.AddAnnotation(txtFld.GetTextField(), 1);
The ‘bf’ above is a Unicode font that gets embedded in the PDF:
BaseFont bf = BaseFont.CreateFont(UnicodeFontPath, BaseFont.IDENTITY_H, BaseFont.EMBEDDED); // Create a Unicode font to write in Greek...
Later-on I fill those fields with greek text.
var acrof = stamper.AcroFields;
acrof.SetField(fieldName, field.Value/*, field.Value*/); // Set the text of the form field.
acrof.SetFieldProperty(fieldName, "setfflags", PdfFormField.FF_READ_ONLY, null); // Make it readonly.
When I view the PDF, most of the times the text is missing and if I click on the (invisible) TextField in Acrobat, then the text becomes visible (until it loses focus again).
Any idea what is going on here?
I have also tried using non-embedded font, but I get the same thing (although I still seem to get embedded fonts in PDF that are similar to the font I use). I don't know if I am missing sth.
It seemed that I was doing the following at the wrong order (the following is the correct):
acrof.SetFieldProperty(field.Name, "setfflags", PdfFormField.FF_READ_ONLY, null); // Make it readonly.
acrof.SetFieldProperty(field.Name, "textfont", bf, null);
acrof.SetField(field.Name, field.Value/*, field.Value*/); // Set the text of the form field.
At least that's hat I think it was wrong.
I have made many changes.
How do I figure out the font family and the font size of the words in a pdf document? We are actually trying to generate a pdf document programmatically using iText, but we are not sure how to find out the font family and the font size of the original document which needs to be generated. document properties doesn't seem to contain this information
Fonts are stored in the catalog (I suppose in a sub-catalog of type font). If you open a pdf as a text file, you should be able to find catalog entries (they begin and end with "<<" and ">>" respectively.
On a simple pdf file, i found the following:
<</Type/Font/BaseFont/Helvetica-Bold/Subtype/Type1/Encoding/WinAnsiEncoding>>
thus searching for the prefix should help you (in some pdf files, there are spaces between
the commponents but '/Type /Font' should be ok).
Of course this is a manual process, while you would probably prefer an automatic one.
On another note, we sometime use identifont or what the font to find uncommon fonts that give us problem (logo font).
regards
Guillaume
Edit : the following code will find all font in the pages. To be short, you search the dictionnary of each page for the subdictionnary "ressource" and then the subdictionnary "font". Each entry in the later is a font dictionnary, describing a font.
PdfReader reader = new PdfReader(
new FileInputStream(new File("file.pdf")));
int nbmax = reader.getNumberOfPages();
System.out.println("nb pages " + nbmax);
for (int i = 1; i <= nbmax; i++) {
System.out.println("----------------------------------------");
System.out.println("Page " + i);
PdfDictionary dico = reader.getPageN(i);
PdfDictionary ressource = dico.getAsDict(PdfName.RESOURCES);
PdfDictionary font = ressource.getAsDict(PdfName.FONT);
// we got the page fonts
Set keys = font.getKeys();
Iterator it = keys.iterator();
while (it.hasNext()) {
PdfName name = (PdfName) it.next();
PdfDictionary fontdict = font.getAsDict(name);
PdfObject typeFont = fontdict.getDirectObject(PdfName.SUBTYPE);
PdfObject baseFont = fontdict.getDirectObject(PdfName.BASEFONT);
System.out.println(baseFont.toString());
}
}
The name (variable "name" in the following code) is what is used in the text to change font. In the PDF, you'll have to find it next to a text. The following number is the size. Here for example, it's size 12. (sorry, still no code for this part).
BT
/F13 12 Tf
288 720 Td
the text to find Tj
ET
Depending on the PDF, if it hasn't been outlined you may be able to open it in Adobe Illustrator, double click the text and select some of it to see it's font family, size, etc.
If the text is outlined then use one of those online tools that PATRY suggests to find out the font.
Good luck
If you have Adobe Acrobat you can see the fonts inside and examine the objects and text streams. I wrote a blog post on this at http://pdf.jpedal.org/java-pdf-blog/bid/10479/Viewing-PDF-objects