Ghostscript converted PDF not rendered corectly with pdfium - pdf

I have a problem with some pdf documents I convert with Ghostsript to pdf/a documents. If the original document contains subseted fonts, the document is not correctly displayed in Chrome (pdfium) after converting. The chars will be displayed as squares.
In Adobe PDF Reader the output will be displayed correctly. Maybe the attached files can help you.
original PDF
converted PDF/A

Related

Extract Image from PDF correctly

I have a PDF file that contains an image where this image is successfully displayed. When I try to extract the image from the PDF file using itextsharp or pdfsharp libs I get bytes, then decode them successfully (because there is /Filter/FlateDecode there). But when I try to convert these bytes to an image using different libs the exception occured (it looks like the bytes are actually not an image). As far as I understand the problem is processing these bytes, but the image in the Pdf is not corrupted because it is shown there correctly. PDF is here.
The images are most likely stored in the PDF image format which is documented in the PDF specification.
It is rather simple to convert them to the Windows BMP format. But still you must convert them and add headers with the specific information from the image attributes from the PDF file.
In PDF a new image line is byte-aligned, in Windows BMP it is DWORD-aligned.
Don't forget to extract the colour table if there is one.

how do I extract the Arabic text of this PDF file correctly?

Today i tried to search a Arabic word in a PDF file that contained Arabic content.
All PDF reader soft wares cannot search any Arabic word in this PDF file.
So I dragged PDF file into Firefox browser and selected a area that contained some words by inspect elements and saw this:
hw ½oiC instead of آخرین سخن
What is type of the encoding used in this PDF file?
how can i encode this to normal text?
It's difficult to comment on the file you are looking at without seeing it but a good starting point is to try Acrobat and by either copying the text and pasting it into a text editor or doing a search for the text content will reveal if it can be extracted correctly or not.
If it can't be extracted properly then there's a good chance the font is lacking a ToUnicode entry (see Section 9.10.1 of the ISO PDF 32000-1:2008 specification for more information).

How to create a pdf with tiff or png images with ghostscript?

The use of xfa inside pdf isn’t only for creating forms
Short : I need valid test cases for a new xfa ᴘᴅꜰ reader, but couldn’t found anyone nor I could find how to use ghostscript in order to create such test cases in batch.
The point is I don’t know how to build the extra information ghostscript should handle without an hex editor.
Ghostscript doesn't handle XFA at all, neither on input nor in output, you cannot use Ghostscript to create XFA files.
Nor does Ghostscript (currently) create PDF files which solely consist of an image. Even if it did, these wouldn't be PNG or TIFF images, as those file formats are not directly supported by PDF. The next release of Ghostscript will contain devices which produce PDF files where the content is a rendered bitmap image created from the input. But they won't be either PNG or TIFF file format.
Note that XFA has been removed from the PDF 2.0 specification (hardly surprising as its XML not PDF format).

SyncFusion, Bad PDF Format, Inverted Color on Inputs

So the state publishes PDF files for us to download. However these files only look right in Adobe Acrobat. When I try to open them in PDFSharp or SyncFusion it fails with the error that there are no pages. When I have PDFSharp or SyncFusion create a new file from the pages in the PDF the result is the following;
However if I save this document as an archive-able PDF/A in Adobe Acrobat Pro, the file straightens out and both PDFSharp and SyncFusion have no problems printing and viewing the resulting PDF file. I can't get SyncFusion to re-save the document as a PDF/A and have it fix like with Adobe.
What could cause this issue?
Edit: The PDF file says it was created by an Elixir program which converts open document formats to PDF. Possibly Librex.

Reformat image format in PDF

I got issues in rendering a couple of images and texts in PDF in Telerik PDF viewer - according to Telerik's documentation it seems those texts/images formats are incompatible.
Are there ways to convert existing images in a PDF and replace it back to the PDF so to make the file compatible to the Telerik PDF viewer?
Many thanks