PDFBox 2.0.11 - slow generate PDF after adding external font - pdfbox

I have a problem with PDF generate performance, after adding and using external font. The way how I do it, is standard(that's why I will not put full code):
I load font by:
PDType0Font.load(new PDDocument(), fontInputStream, false)
Then for each acroForm I set this font by:
PDResources defaultResources = acroForm.getDefaultResources();
COSName fontCOSName = defaultResources.add(font);
Does anyone had similar problem or know how can I resolve it?

Related

Editing the metadata of a PDF file using PDFBox

I am able to edit the metadata for a PDF file using PDFBox3.x. However, at the final step I am being forced to save the changes to a different file.
Is there a way to make changes to the original file itself?
Sample code:
PDDocument doc = Loader.loadPDF(inFileObj);
PDDocumentInformation pdd = doc.getDocumentInformation();
pdd.setAuthor("Mr. Stack Overflow");
// ...
doc.save(outFileObj); // This works, however doc.save(inFileObj) makes the PDF document a blank document

iText 7 Chinese characters and merge with existing pdf template

I have to rephrase my question, basically my request is very straight forward, i want to display Asian characters in the generated pdf file from iText7.
As of now i have download the NotoSansCJKsc-Regular.otf file and assign a variable to hold the path, below is my code:
public static string FONT = #"D:\Projects\Resources\NotoSansCJKsc-Regular.otf";
PdfWriter writer = new PdfWriter(#"C:\temp\test.pdf");
PdfDocument pdfDoc = new PdfDocument(writer);
Document doc = new Document(pdfDoc, PageSize.A4);
PdfFont fontChinese = PdfFontFactory.CreateFont(FONT, PdfEncodings.IDENTITY_H);
doc.SetFont(fontChinese);
but the issue i am facing now is whenever the code runs to this section:
PdfFont fontChinese = PdfFontFactory.CreateFont(FONT, PdfEncodings.IDENTITY_H);
i am always getting this error: The request could not be performed because of an I/O device error. and this error doesn't make sense to me and I am struggling to find out the solution, could someone in here had the similar issue plz, the code is in C#.
Many thanks.
I can confirm that above code is working as expected, the .otf file that I was originally downloaded was corrupted, hence I got above error.

Synfusion : Image from url not showing in html to pdf conversion

I'm trying to generate pdf from html which is working fine but the issue I am having is that my images are not showing: the images are from a url <img src="https://image.shutterstock.com/image-photo/large-beautiful-drops-transparent-rain-260nw-668593321.jpg" /> I think the images have not been loaded before the conversion. Please how can I delay the conversion or achieve this i.e ensure the image is rendered with the pdf.
WebKitConverterSettings settings = new WebKitConverterSettings();
var baseUrl = url;
settings.PdfHeader = HeaderHTMLtoPDF(url");
settings.PdfFooter = FooterHTMLtoPDF({url});
//Set WebKit path
var contentRoot = _configuration.GetValue<string>(WebHostDefaults.ContentRootKey);
settings.WebKitPath = Path.Combine(contentRoot, "QtBinariesWindows");
settings.Margin.Top = 30;
//Assign WebKit settings to HTML converter
htmlConverter.ConverterSettings = settings;
var pdfViewUrl = $"{baseUrl}/api/pdf";
Task<PdfDocument> convertPdfTask = Task<PdfDocument>.Factory.StartNew(() => htmlConverter.Convert(pdfViewUrl));
PdfDocument document = convertPdfTask.Result;
MemoryStream stream = new MemoryStream();
document.Save(stream);
document.Close(true);```
We have checked the conversion with provided details, its working properly. We have creates a simple html with provided image and it is properly converted to PDF document.
The reported image missing issue may occurs due to missing of OPENSSL assemblies. To access the resource from HTTPS site, the HTML converter requires OPENSSL assemblies. So, please make sure the OPENSSL assemblies are available in the machine where the conversion takes place. Please refer below links for more details,
Prerequisites - https://help.syncfusion.com/file-formats/pdf/convert-html-to-pdf/webkit#openssl
Troubleshooting - https://help.syncfusion.com/file-formats/pdf/converting-html-to-pdf#troubleshooting
However, the below mentioned OPENSSL assemblies can be placed in the Windows system folder of the machine. (for 64-bit machine, it should be place in $SystemDrive\Windows\SysWOW64 and for 32-bit machine, it should be place in $SystemDrive\Windows\System32).
libeay32.dll
libssl32.dll
ssleay32.dll
Note : I work for Syncfusion.
An alternative solution is to convert your image to base64 so the image is inline in the html.

Remove PDFont caching with Apache tika

I am trying to extract text only from a number of different coduments (rtf doc pdf). I naturally turned to Apache Tika because it can autodetect the document and extract text accordingly. I am only interested in the text and not formatting etc.
My application ends up with a big memory leak and on investigating it, this is coming from caching from PDFFont class from the PDFBox dependency. I am not interesting in caching Fontmetrics and other Font formatting issues from pdfs as I want to only extract the text.
I am using tika 1.12. Does anyone know how to get around this cahcing issue. This is how I am using Autodetect:
AutoDetectParser parser = new AutoDetectParser();
BodyContentHandler handler = new BodyContentHandler(-1);
Metadata metadata = new Metadata();
FileInputStream inputstream = new FileInputStream(new File(child.getPath()));
ParseContext context = new ParseContext();
parser.parse(inputstream, handler, metadata, context);
String s=null;
s =handler.toString();
handler=null;
context=null;
inputstream.close();
PDFont.clearResources();
So I fudged a workaround and just called System.gc(); everytime the file had finished being processed which works a treat but doesn't really answer the question.

Java Print encoding with Sun PDFRenderer

I'm a beginner in Java programming and also here at stackoverflow. Currently I'm trying to print PDF-Files with the com.sun.pdfview library. It works very often, but with some documents I get the following Error:
java.lang.IllegalArgumentException: Unknown encoding: SymbolSetEncoding
at com.sun.pdfview.font.PDFFontEncoding.getBaseEncoding(PDFFontEncoding.java:199)
at com.sun.pdfview.font.PDFFontEncoding.<init>(PDFFontEncoding.java:78)
at com.sun.pdfview.font.PDFFont.getFont(PDFFont.java:133)
at com.sun.pdfview.PDFParser.getFontFrom(PDFParser.java:1166)
at com.sun.pdfview.PDFParser.iterate(PDFParser.java:719)
at com.sun.pdfview.BaseWatchable.run(BaseWatchable.java:101)
at java.lang.Thread.run(Thread.java:722)
I should inform you, that these documents are written in a caucasian language (georgian) and the typical font is Sylfaen.
the error occurs in the following code:
PDFRenderer pgs = new PDFRenderer(page, g2, imgbounds, null,null);
try {
page.waitForFinish();
pgs.run();
I believe that these documents need to use a different encoding or I need to specify the font, unfortunately I couldn't find an ankle where I can take a look or change setting.
Thank you
Martin
PDFRenderer only supports a limited subset of the PDF spec.