The font properties looks similar but looks and feel is different in two pdf documents - pdf

I have a PDF template document given by my client as a reference. My application has to create it. When i compared the PDF generated by my application and the template given by the client i could see the look and feel are slightly different.I opened it in Adobe acrobat pro dc, i could still see the font properties are same. Please see the link below to see the PDFs header i created which has the above mentioned issue.
Please see the image:
I have the font properties defined in xml as below.This will be unmarshalled later.
<font name="Arial Narrow" id="blackBold" fsize="12" type="bold" rgb="0,0,0"/>
I have the different font type defined in a map like below,
fontAndAlignmentMap.put("bold", 1);
//Internally this will be resolved to-->com.itextpdf.text.Font.BOLD = 1;
The itext Font will be created from an unmarshalled xml font object as below:
FontFactory.registerDirectories();
com.itextpdf.text.Font itextFont =
FontFactory.getFont(sbcFont.getName(), sbcFont.getFsize(), fontAndAlignmentMap.get(sbcFont.getType()), new BaseColor(r,g,b));
The 10 page pdf document have different fonts and formatting.The Statically defined fonts are unmarshalled and loaded into the itextFontMap as shown below.
itextFontMap.put(sbcFont.getId(), itextFont);
sbcFont.getId() are "blackBold","blackRegular" etc.Depends on the use the appropriate font will be pulled from this map.
Phrase pSbc = new Phrase("Summary of Benefits and Coverage:",
SBCMappingSingletonBuilder.itextFontMap.get("blackBold"));`
Phrase pSbcDesc = new Phrase(
" What this Plan Covers & What You Pay For Covered Services",
SBCMappingSingletonBuilder.itextFontMap.get("blackRegular"));
When i printed the family name of the com.itextpdf.text.Font object.
blackBold.getFamilyname() :Arial Narrow
blackRegular.getFamilyname() :Arial Narrow
This is what printed from the code.
sbcFont.getName() :Arial Narrow Bold
sbcFont.getName() :Arial Narrow
Phrases added to a PdfPCell; PdfPCell added to PdfPTable and then to the document.
Can any one please throw some light here.I am using java 1.6 and itextpdf-5.3.4
The sample template which i am referring please click
here
The one i am tring to create similar to the template,please click
here
I opened the pdf in Adobe Acrobat Pro.The bold font look like narrow even though the font properties are showing as bold in Adobe Acrobat Pro.But when i opened the pdf in a browser,it looks good.

Related

Tagged annotation link not reading properly in PDF with JAWS?

What is the problem?
By using PDFBOX I tried to tag a link which contains annotation. The PDF creating sample code is here. I tagged a nested paragraph link by using PDFBOX. The newly created tagged PDF is passing the Adobe checker.
What I observed?
After tagging pdf I tried to read it using JAWS. But unfortunately the jaws is not reading links in both "entire document mode" and "read currently visible page" mode. Then I jumped to StructTree Root and compared with adobe created tagged PDF vs our tagged PDF. The ParentTree(NUMTree) is not matching with adobe tagged pdf.
Tagged by me
Tagged by Adobe
What I Tried?
I tried to replicate the adobe numtree in my pdf. Almost I am able to create same but except one object.
Above image left side is adobe tagged and right side is created by me. I didn't understand the adobe why it is created entire structtree under this (142 O R) object? I am adding annotation object to TumTree by using this code
private void addWidgetContent(PDObjectReference objectReference, PDStructureElement fieldElem, String type, int pageIndex) {
COSDictionary annotDict = new COSDictionary();
COSArray annotArray = new COSArray();
annotArray.add(COSInteger.get(currentMCID));
annotArray.add(objectReference);
annotDict.setItem(COSName.K, annotArray);
annotDict.setString(COSName.LANG, "EN-US");
annotDict.setItem(COSName.P, currentElem.getCOSObject());
annotDict.setItem(COSName.PG, pages.get(pageIndex).getCOSObject());
annotDict.setName(COSName.S, type);
annotDicts.add(annotDict);
setNextMarkedContentDictionary();
numDictionaries.add(annotDict);
fieldElem.appendKid(objectReference);
currentElem.appendKid(fieldElem);
}
And one more bug I saw in Adobe preflight is
How can I fix these bugs and what is the correct way of tagging link annotation to be read by JAWS? Please help me .......
I got some update to share. Now I crated parent tree root without having any bugs. Here the file tagged pdf file.
Still this tagged pdf not reading links. Why ?????????

How to get rid of unwanted extra pages when converting a goole document to pdf via google-apps-script?

I have an old script that (among other things) converts a google document to pdf.
It used to work ok, but now two extra blank pages appear in the pdf version of the file.
I just discovered that this problem affects also the "download as pdf" menu option in google documents. There is a number of workarounds in that case, but I need a workaround for google-apps-script.
In this post the solution to a similar problem seems to involve a fine tuning of the page size. I tried something like that, but it does not trivially apply.
I also tried some other (kind of random) variations for the page size and margins, but to no avail.
Below I'm pasting a minimal working example. It should create a document file "test" and its pdf version "test.pdf" in your main drive folder.
Any help getting rid of the two extra pages is greatly appreciated.
Thanks
function myFunction() {
// this function
// - creates a google document "test",
// - writes "this is a test" inside it
// - saves and closes the document
// - creates a pdf version of the document, called "test.pdf"
//
// the conversion is ok, except two extra blank pages appear in the pdf version.
// create google document
var doc = DocumentApp.create('test');
var docFile = DriveApp.getFileById( doc.getId() );
// set margins (I need landscape layout)
// this is an attempt to a solution, inspired by https://stackoverflow.com/questions/18426817/extra-blank-page-when-converting-html-to-pdf
var body = doc.getBody();
body.setPageHeight(595.2).setPageWidth(841.8);
var mrg = 40; // in points
body.setMarginTop(mrg).setMarginBottom(mrg);
body.setMarginLeft(mrg).setMarginRight(mrg);
// write something
body.appendParagraph('this is a test').setHeading(DocumentApp.ParagraphHeading.HEADING2).setAlignment(DocumentApp.HorizontalAlignment.CENTER);
// save and close file
doc.saveAndClose();
// convert file to pdf
var docblob = docFile.getAs('application/pdf');
// set pdf name
docblob.setName("test.pdf");
// save pdf file
var file = DriveApp.createFile(docblob);
}
I found the source of the problem and a solution in this post on the google product forum, dating 8 months back.
The extra pages appear in the pdf if the option in view -> print layout is not checked.
I did some further tests, with my accounts and my colleagues'.
The results are consistent:
when view -> print layout is not checked two extra pages appear in the pdf version of the document
when view -> print layout is checked the pdf version of the document has the expected number of pages.
this setting affects also the documentApp services in Google Apps Script. That is: the above script produces the expected pdf version only if the "view->print layout" option in Google Documents is checked.
I do not see how this behaviour could be a "feature", so I think it's a bug. By the way "print layout" does not seem to have any visible effect on my documents (other than messing up the pdf version). I'm surprised that after 8 months the bug is still out there.
Number 3 above surprised me, because I did not think that an option set manually in a (any) google document would affect my scripts.
I'm currently looking for a way of setting the "print layout" option from inside the script. So far I had no luck with that.

Why won't my docbook links work when linking to something earlier in the document?

I'm going from docbook to fo to pdf and I need to have text that goes to different parts of the document when clicked. I'm using the following format
<link linkend="M1350424Trace">
<emphasis role="bold">Link To Trace</emphasis>
</link>
Where M1350424Trace is the id of a paragraph. It works how I want it to work when I'm linking to something that comes later in the document but not when it's trying to link to something that comes earlier. Why is that?
Here's the .fo
<fo:basic-link internal-destination="M1350424Trace">
<fo:inline>
<fo:inline font-weight="bold">Link To Trace</fo:inline>
</fo:inline>
</fo:basic-link>
I've reproduced your case with the following input data:
Sample DocBook file with link and xref as a linkage mechanism
link and xref both and the target id are on different pages after rendering.
DocBook 5.1 as a source.
DocBook XSL 1.79.1 - as a stylesheets.
FOP 2.3 as a renderer.
The produced PDF file was tested with xpdf 3.04, Acrobat Reader 7.08 on Linux Debian 9.5
Works as should be: pressing the link moves to the page with a required element (para) with a required id number.
So you need to provide the exact file and exact configuration that raised this strange behavior. You can fill it with a dump data in case your document is under NDA or contains sensitive information, BUT it should be the same in matter of structure and elements within as an original one.
I'll try to reproduce your case.

iText - Create link to embedded image in PDF/A-3

I've created a PDF/A-3 document with attached image files using iText 5.5.4. What I need is to add links in the body of the document to directly open the images.
I tried this to create the links :
PdfAnnotation linktoimg = PdfAnnotation.createFileAttachment
(writer, rectangle, "Open picture", fileSpec);
writer.addAnnotation(linktoimg);
Compilation is OK but at run time I get a PdfAConformanceException :
Annotation type /FileAttachment not allowed
I also tried to add an anchor to open the images but I've found that ISO-32000-1 specification doesn't support it. And the gotoEmbedded functions only works for attached PDF files.
So is there a way to achieve this or am I facing a limitation with PDF/A?
This is not a limitation of PDF/A-3 (or PDF/A-2). In fact, you have uncovered a bug in the iText PDF/A implementation. FileAttachment annotations are disallowed in PDF/A-1, but not in PDF/A-2 and PDF/A-3.
I have pushed a fix. It will be available in the develop branch of iText repository on GitHub soon. Alternatively, if you don't want to build from source, you can download a snapshot build from the iText repository

How to detect if a pdf is text searchable or non text searchable?

i have a set of pdfs, from which i want to process( VB.NET) only those which are non text searchable, can you please tell me how to go about this?
Generally speaking, the way to do this is open up each page and rip the content stream and see if any text operators are executed that place text on the page.
Let me explain what that means - PDF content is a small RPN language that contains operations that mark the page in some way. For example, you might see something like this:
BT 72 400 Td /F0 12 Tf (Throatwarbler Mangrove) Tj ET
Which means:
Begin a text area
Set the position of the text baseline to (72, 400) in PDF units
Set the font to a resource named F0 from the current page's font resource dictionary
Draw the text "Throatwarbler Mangrove"
End a text area
So you can try short cuts
does my page's resource dictionary contain any fonts?
This will fail in some cases because some PDF generation tools put fonts into the resource
dictionary and don't use them (false positive). It will also fail if the page content contains a Form XObject which contains text (false negative).
does my page's content stream have BT/ET opertors?
This will get you closer, but will fail if there is not content in them (false positive) or if they're not present, but there's a Form XObject which contains text (false negative).
So really, the thing to do is to execute the entire page's content stream, including recursing on all XObject to look for text operators.
Now, there's another approach that you can take using my Atalasoft's software (disclaimer, I work for Atalasoft and have written most of the PDF handling code, I also worked on Acrobat versions 1-4). Instead of asking, does this page contain any text, you can ask "does this page contain only a single image?"
bool allPagesImages = true;
using (Document doc = new Document(inputStream))
{
foreach (Page p in doc.Pages)
{
if (!p.SingleImageOnly)
{
allPagesImages = false;
break;
}
}
}
Which will leave allPagesImages with a pretty decent indication that each page is all images, which if you're looking to OCR is the non-searchable documents, is probably what you really want.
The down side is that this will be a very high price for a single predicate, but it also gets you a PDF rasterizer and the ability to extract the images directly out of the file.
Now, I have no doubt that a solid engineer could work their way through the PDF spec and write some code to extend iTextPdfSharp to do this task I think that if I sat down with it, I might be able to write that predicate in a few days, but I already know most of the PDF spec. So it might take you more like two weeks to a month. So your choice.
I think this option could be your consideration, though I haven't tested the code yet but I think it can be done by read the properties for each PDF files that you want to proceed.
You might check this link :
http://www.codeguru.com/columns/vb/manipulating-pdf-files-with-itextsharp-and-vb.net-2012.htm
You have to read the producer properties right after you proceeded it. That's just only example. But my advice please include your code here so we can give a try to help you. Bless you