Big PDF file when language is PL (Polish)

Big PDF file when language is PL (Polish) - pdf

I converted a Smart Form output into PDF using the function module SX_OBJECT_CONVERT_OTF_PDF.
My problem is that when the language is PL (Polish) the PDF file is 10 times bigger comparing to EN language. Why?

Gunstick answer is probably right.
Sap note: 843480 discuss this issue.
As of release 620 onward, there is support patches that enable pdf elements( such as fonts) to be compressed. The resulting pdf will be larger then the only English one, but it will probably be less than 10 times larger.

This may be that polish uses a specific font (special characters) which is not installed by default on an OS. So the pdf converter includes the complete font into the document in order to render it correctly at the destination.
This is just speculation though.

You may try this one: http://lucattelli.com/blog/?page_id=478
This FM can take the binary PDF and convert it to BASE 64 and send it as a mail attachment.
See if it helps

Related

PDF metadata limit

Is there a limit to how many metadata keyword characters are saved in a PDF? I am saving part numbers under work instructions designed in Microsoft Publisher. It will retain well past 255 characters, however when I export the document to PDF it gets cut off.
Thank you for all your help.

According to this accepted answer in the Adobe Forums
In the Acrobat Family, all fields on the main Document Properties tab are limited to 1999 characters or less. Fields on the Custom Properties tab are limited to 255 characters.
However, there is no size limit mentioned in the PDF Specification, so how much is saved and displayed depends on the implementation of your PDF application.

How can get original font names from PDF generated by Ghostscript?

I have a pdf which is produced by Ghostscript 8.15. I need to process this pdf from my software which extract font names from pdf file and then perform some operations. But when I extract font names from this pdf file, these names are not as same as should be. For example: Original font name is 'NOORIN05', but pdf file contains 'TTE25A5F90t00'. How can decode these font names to original names. All fonts are TTF.
NOTE:
Why I need to extract the fonts.
Actually there is a software named InPage which was most famous in India and Pakistan to write documents in Urdu language, because before the unicode support in word processor, this was the only solution to type Urdu in computer. Due to complexity of Urdu language, this software uses 89 fonts files named NOORIN01 TO NOORIN89. The reason of using too many font files is to contain all the Urdu ligatures which are more than 19 thousands. Because each file can contains only 255 ligatures so that's why they used this technique before the unicode. Now copy and paste the text from pdf file generated by this software, results a garbage in MS Word. The reason which I told above 89 font files. So there was no way to extract text from such kind of old pdf files. (Now a days this software has support of unicode but I am talking about old files). So I developed a software in C# to extract text from such old pdf files. The algorithm I am using, creating a database file which contains all the names of 89 font files with all the aschii codes, and in next column I typed Urdu unicode ligature in unicode. I process the pdf file character by character with font, matching the font name from my database file and get the unicode ligature from database and then display in a text box. So in this way I get the unicode text successfully. My software was working fine in many of pdf files. But few days ago I get complaint from a person that your software fails to extract text from this pdf. When I test, I found that the pdf file doesn't contain the original font names so that's why my software unable to do further process. When I checked the properties of this pdf file, It shows the PDF producer GPL Ghostscript 8.15. So I search the net and study the documentation related to fonts but still couldn't find any clue to decode and get the original font names.

The first thing you should do is try a more recent version of Ghostscript. 8.16 is 14 years old..... The current version is 9.21.
If that doe snot preserve the original names (potentially including the usual subset prefix) then we'll need to see an example input file which exhibits the problem.
It might also be helpful if you were to explain why you need to extract the font names, possibly you are attempting something which simply isn't possible.
[EDIT}
OK so now I understand the problem, I'm afraid the answer to your question is 'you can't get the original font name'.
The PDF file was created from the output of the (Adobe-created) Windows PostScript printer driver. When that embeds TrueType fonts into the PostScript stream as type 42 fonts, it gives them a pseudo-random name which is composed of 'TT' followed by some additional characters that may look like hex, but aren't.
Old versions of the Ghostscript pdfwrite device (and 8.15 is very old) simply used that name verbatim, and that's what has been used for the font names in the PDF file you supplied.
Newer versions are capable of digging further into the font and picking up the original font name which is present in the PostScript. Unfortunately the old versions didn't preserve that. Once you've thrown the information away there is no way to get it back again.
So if the only thing you have is this PDF file then its simply not possible to get the font names back. If the person who supplied you with the PDF file can remake it, using a more recent version of Ghostscript, then it will work. But I presume they don't have the PostScript program used to create a 14 year old file.

Pentaho don't Generete PDF in UTF-8 encoding

I have a problem related with PDF exportation in Pentaho BI plattform. I'm not able to produce a correct PDF file encoded in UTF-8 and which contains Spanish characters. That procedure neither works properly in local Report Designer nor in BI server. Special characters like 'ñ' or 'ç' are skipped in the PDF file. Generation in other formats works just fine (HTML, Excel, etc.).
I've been struggling with that issue for few days being unable to find any solution and would be grateful for any clue.
Thanks in advance
P.S. Report Designer and BI platform version 6.1.0.1

Seems like a font issue. Your font needs to know how to work with unicode and it needs to specify how to "draw" the characters you want.
Office programs (at least MS office) by default automatically select font, which can render any character (if font substitution is enabled), however PDF readers don't do it: they always use the exact font you've specified.
When selecting appropriate font, you have to pay attention to supported Unicode characters and to the font's license: some fonts don't allow embedding and Pentaho embeds font's subset, which was used, into generated PDF files if encoding is UTF-8 or Identity-H.
To install fonts for linux server you need to copy font files either to your java/lib/fonts/ folder or to /usr/share/fonts/, grant read rights to the server's user and restart the server application.

Extract pages from DOC to new DOC

We are developing a printing server that allows user to upload a DOC and print it out via HP ePrint. It needs to support page extraction.
I tried to use macro (with the help of Adobe Acrobat Reader Pro and MS Word) to extract pages into PDF. But it turns out that the size of PDF may be larger in size than expected.
Is there any way to extract pages (without lossing format - E.g. Table in DOC) from DOC to DOC, so that the size can be approximately the size?

This is a difficult requirement. It sounds like you have run into 2 problems (large PDFs and format loss) at the outset. You should probably say more about what you mean by "extraction" and why PDF is part of your solution because that's quite different from "upload and print" and "doc to doc". That way readers will have more suggestions for you.
I would suggest you try to approach the problem from a different angle if possible, because I suspect that you are unlikely to achive a good, efficent, stable result. One possible approach is to turn the DOC into PDF and then use iText or some other PDF library to manipulate the PDF before printing. It really depends on what you are trying to achieve - the specifics of your extract/merge/convert.

Embedding vector graphics in Word

I'm working on a calculation program which creates graphs from input data with ZedGraph. My client would like to embed those graphics into Microsoft Word and the publish the document as PDF. Both PNGs and enhanced metafiles produce badly rastered results in the PDF.
I've tested this with Office 2007 and the "built-in" PDF publisher.
Can you recommend any workflow that leads to not breaking the vectorized data on the way to PDF?
Update
Thanks for all answers. It turned out, that .net actually doesn't create metafiles when writing to disk. See the respective question. Once I started using P/Invoke to create real metafiles on disk (instead of the automatic PNG fallback) the quality of the generated PDFs and prints improved vastly.

What about embedding Excel graphs?

The shareware graphics filter importps allows you to insert PDF and PS files into Word documents.
A demo version that scrambles colors is available at:
http://www.helga-glunz.homepage.t-online.de/importps

If all else fails, you might try creating your PNG files at crazy-high resolution. This might make the rasterization fine-grained enogh to not be noticeable.

I don't know anything about ZedGraph, but if you can export to (or somehow get to) an EPS file, that should work.
I often need to get vector artwork out of a PDF for use in Word, and to do this I usually go via Adobe Illustrator to save as an EPS. [Illustrator just happens to be something that I have available - I'm not saying there's anything magical about it; you may be able to create EPS files via another route.]
I see you're using Office 2007 and I can't say I have much experience with that, but the situation with Word 2003 is that you can insert an EPS that was exported from Adobe Illustrator if you choose "Illustrator 8.0" format when exporting from Illustrator. Newer versions of Illustrator seem to create files that Word 2003 can't handle. (Word 2007 may be better in this respect).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas