Apache PDFBOX renderImageWithDPI UTF-8 Issue - pdfbox

I am trying to export PDF pages to PNG using Apache PDFBOX(2.0.8) PDFRenderer - renderImageWithDPI . Everything works fine except when we have UTF-8 characters in text png show boxes instead of chracter . If i get text content I can see correct characters but image rendering is having this issue

Related

Decreasing size of PDF when using puppeteer for pdf generation

We are using IDR for converting PDF documents to HTML.
After doing some modifications we are using puppeteer for converting that document back to PDF I am getting files with increased page size (even if I don't do any modification to my HTML).
For ex:- If the original page size is 500kb I am getting a page with 1000kb
The page only contains some text.
Please help me to understand what is the reason behind this and how to solve this.

How to remove overlays from PDF file using PDFBox?

I am using Apache Tika 1.17 to extract content from PDF files. There is a small image overlay on a page in PDF due to which Tika is not able extract any content from that page but for rest of the pages it is working fine.
Is there any way to remove overlay from PDF page using PDFBox before sending it to Tika?
As a workaround, I converted the PDF to PNG and Tika is using TesseractOCR to extract content. But I am losing some content and text format this way.

Display pdf in iframe using ssl/https based

I want to Showing PDF inside iframe for user preview before download the file or print,
i use byte array convert my report file to pdf then showing it.
everything was perfect until i need something in ssl/https.
because of that i must change my application to ssl/https can some one show me how can i show pdf in this condition.
thanks for listening and reading my prob.
here my code
reportDocument.Load(reportPath);
reportDocument.SetDataSource(dataSet);
_contentBytes = StreamToBytes(reportDocument.ExportToStream(ExportFormatType.PortableDocFormat));
.....
//setting header
.....
//then flush
stream.Flush();
i found the solution in
PDF conversion suddenly fails if reading stylesheet from SSL
the problem is the pdf reader inside my chrome browser.
so i updating my chrome reader in store search pdf viewer..
just it..

JPG, png and gif "This file type is not supported" in joomla 3.2.1

Is it me or is this the new joomla release? It worked fine in 3.2 as far as I remember.
I tried png as well and gif. Different filenames.
For the unbelieving people, Legal extensions is:
bmp,csv,doc,gif,ico,jpg,jpeg,odg,odp,ods,odt,pdf,png,ppt,swf,txt,xcf,xls,BMP,CSV,DOC,GIF,ICO,JPG,JPEG,ODG,ODP,ODS,ODT,PDF,PNG,PPT,SWF,TXT,XCF,XLS
for me
I tried uploading an image named java.jpg like your screenshot and it didn't work, then I tried php.jpg and didn't work as well. The same images with different filenames were uploaded fine, so I suppose it's the filename (java, php) and not the filetype (jpg) causing the error.

Why do ImageMagick-based thumbnail-images of PDF-files in Typo3 have black background?

Since version 6.7.5, ImageMagick has changed its colorspace from RGB to sRGB. Because of that I also had to change the setting [GFX][colorspace] = sRGB in the configuration of my Typo3 CMS software that makes use of ImageMagick. Everything is working fine again - except thumbnail-creation for PDF-files that now always have a black background (should be white).
It's possible to see all non-black elements of PDF-files (like images etc) on the thumbnails, but all the background that would usually be white is now black. This error only happens for PDF-files. All other image-thumbnails for JPG-, GIF- and PNG-files look as expected (even if they have transparent background).
Does anyone have an idea how I could solve this problem? Is this an ImageMagick-issue or a Typo3-Issue?
Based on Creating JPG thumbnails from PDF causes problems with new version of ImageMagick I was able to answer this question by myself. If your want to apply this solution to Typo3, the following file changes are needed:
Go to your Typo3 directory and open the file ./t3lib/class.t3lib_stdgraphic.php and replace all appearances of
$this->cmds['jpeg'] = '-colorspace ' . $this->colorspace . '
with
$this->cmds['jpeg'] = '-colorspace ' . $this->colorspace . ' -flatten
Use the ImageMagick convert with the "-flatten" option for the background. The actual "PDF" thumbnail conversion is passed to GhostScript as a delegate which means both must be installed on the server correctly. If done correctly which happens in most cases, the path for GhostScript within your coding is not needed if using ImageMagick. The actual PDF encoding can also be a problem. There are shared hosting accounts running 7 year old versions of both ImageMagick and GhostScript with no problem on PDF conversion to thumbnail within the current stable TYPO3 CMS 6.1.1.
A free direct download PDF test file with detailed instructions to solve a "PDF Thumbnail Generation Problem" in TYPO3 CMS is available at Smargasy, Inc. "http://www.smargasy.com/fileadmin/media_data/community/Smargasy_PDF-Thumbnail-Compatibility-Test-File.pdf". The test file helps isolate the problem on systems that use ImageMagick and GhostScript as the image processing and conversion program in a shared hosting environment.
Best Regards