I want to convert this PDF file compiled with LaTeX (XeLaTeX engine so that to use an Arabic font) and I want to upload it to the web and prevent copy and paste of its content.
Since I am looking for a freeware to do that, I came across two powerful beasts to do this job, namely, ImageMagick and Ghostscript. All what I need is to convert one text PDF to image PDF in one go, preferably with batch processing if possible (to convert many PDFs in one go).
I run this code in command line and it works fine for English-written PDFs:
convert someenglish.pdf output.pdf
Now when I do the same for an Arabic PDF I get this error:
convert.exe: PDFDelegateFailed `[ghostscript library] -q -dQUIET -dSAFER -dBATCH
-dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sD
EVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" "-sOutputFile
=C:/Users/doctorate/AppData/Local/Temp/magick-65203BNMxTDhXtkF%d" "-fC:/Users/doctorate/Ap
pData/Local/Temp/magick-65206AK54hOoKA62" "-fC:/Users/doctorate/AppData/Local/Temp/ma
gick-6520hDn-KMyTyxy2"': **** Error reading a content stream. The page may be
incomplete.
**** Incorrect object count in object stream.
Error: /rangecheck in resolveobjectstream
Operand stack:
78424 10 1 10 --dict:7/15(L)-- 26 --nostringval-- 35 --nostri
ngval-- --dict:2/2(L)-- --dict:2/2(L)-- --dict:2/2(L)-- --dict:2/2(L)--
--dict:4/4(L)-- --dict:4/4(L)-- --dict:4/4(L)-- --dict:4/4(L)-- --dict
:4/4(L)-- --dict:3/3(L)-- --dict:2/2(L)-- --nostringval-- --dict:7/7(L)-
- --dict:10/10(L)-- --nostringval-- --nostringval-- Type Font Subtyp
e CIDFontType2 BaseFont MYCROL+(AH
Execution stack:
%interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-
- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- fa
lse 1 %stopped_push 1983 1 3 %oparray_pop 1982 1 3 %oparray_
pop 1966 1 3 %oparray_pop --nostringval-- --nostringval-- --nostri
ngval-- --nostringval-- --nostringval-- --nostringval-- --nostringval--
--nostringval-- --nostringval--
Dictionary stack:
--dict:1193/1684(ro)(G)-- --dict:1/20(G)-- --dict:82/200(L)-- --dict:82
/200(L)-- --dict:116/127(ro)(G)-- --dict:280/300(ro)(G)-- --dict:24/32(L)-
-
Current allocation mode is local
GPL Ghostscript 9.15: Unrecoverable error, exit code 1
# error/pdf.c/InvokePDFDelegate/263.
convert.exe: no images defined `test.pdf' # error/convert.c/ConvertImageCommand/
3210.
Question
What am I missing here? I am not a programmer, so please consider this in your answer. I am very grateful if you could show how to do this in batch process.
Notes
Windows 7 32bit
Ghostscript version 9.15
Quality of image is not an issue for me even 72dpi will be fine
I want to strike a balance between size of the output and clarity of text. I just want the text to be readable on the web and not to do some OCR processing with it, so image doesn't need to be very sharp. Size of output is more important, the less the better and honestly I am clueless as to what might works better; to convert the PDF file into PNG or into JPEG in this case.
I don't want to burst a PDF into multiple serially named PNGs or JPEGs, simply one PDF to another PDF but as images inside and no more copy&paste-prone text.
Update
I tried to make a minimal working example PDF to mimic the original PDF and found that problem arises by including a certain Arabic font called (AH) Manal Black. Running pdffonts from command line on this MWE PDF gives:
Syntax Error (18062): Illegal character ')'
Syntax Error (18076): Dictionary key must be a name object
Syntax Error (18085): Dictionary key must be a name object
Syntax Error (18248): Illegal character ')'
Syntax Error (18248): Dictionary key must be a name object
Syntax Error (18253): Dictionary key must be a name object
Syntax Error (18599): Illegal character ')'
Syntax Error (18599): Dictionary key must be a name object
Syntax Error (18607): Dictionary key must be a name object
name type emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
GAKHDJ+(AH CID TrueType yes yes yes 5 0
HTCSVQ+Amiri-Regular CID TrueType yes yes yes 7 0
By excluding this Arabic font when compiling the document using LaTeX/XeTeX engine, the convert command works just fine like in other English PDFs. So most probably this problem is linked to parsing of the fonts.
Update: A minimally working example is here: https://www.dropbox.com/s/qdeuzips0ivas4q/mwe_ar.pdf?dl=0
The minimally working example has PDF object no. 10 as an ObjStm (object stream), where this part can be found (I edited the whitespace formatting for improved readability):
<< /Type /Font
/Subtype /Type0
/BaseFont /GAKHDJ+#28AH)#20Manal#20Black
/Encoding /Identity-H
/DescendantFonts [4 0 R]
/ToUnicode 12 0 R
>>
So the font name, (AH) Manal Black, has properly hex-escaped the blanks as #20 and the opening parenthesis ( as #28, but it hasn't hex-escaped the closing parenthesis ) as #29, as it should.
Without knowing more about the PDF generating process, I guess that the Creator/Producer combo as given through the file's metadata,
Creator: XeTeX output 2015.05.01:1207
Producer: xdvipdfmx (20140317)
is to be blamed. This is a bug in the PDF generating software...
Update
Maybe I should reveal how I dissected and uncompressed the MWE PDF:
Trying it with QPDF didn't work:
qpdf --qdf --object-streams=disable mwe_ar.pdf qdf.pdf
object stream 10 (file position 585): unexpected )
Trying it with pdftk didn't work either:
pdftk mwe_ar.pdf cat pdftk.pdf uncompress
Error: Unable to find file.
Error: Failed to open PDF file:
mwe_ar.pdf
Errors encountered. No output created.
Done. Input errors, so no output created.
Trying it with MuPDF's mutool also failed:
mutool clean -d mwe_ar.pdf mutool.pdf
warning: lexical error (unexpected ')')
error: invalid key in dict
error: cannot parse dict
error: cannot open object stream (10 0 R)
error: cannot load object stream containing object (1 0 R)
warning: cannot load object (1 0 R) into cache
warning: lexical error (unexpected ')')
error: invalid key in dict
error: cannot parse dict
error: cannot open object stream (10 0 R)
error: cannot load object stream containing object (4 0 R)
error: cannot load object (4 0 R) into cache
Finally, as a last resort, PeePDF.py to the rescue:
$ cat peepdf-commands.txt
object 10
$ peepdf.py -s peepdf-commands.txt
<< /Length 1000
/N 13
/Type /ObjStm
/Filter /FlateDecode
/First 84 >>
stream
9 0 3 72 11 133 2 197 1 312 15 343 4 446 14 625 19 876 6 1344 18 1514 5 1758 7 1886 <</Font<</F1 5 0 R/F2 7 0 R>>/ProcSet[/PDF/Text/ImageC/ImageB/ImageI]>>
<</Resources 9 0 R/Type/Page/Parent 11 0 R/Contents[8 0 R]>>
<</Type/Pages/Count 1/Kids[3 0 R]/MediaBox[0 0 595.28 841.89]>>
<</Creator( XeTeX output 2015.05.01:1207)/Producer(xdvipdfmx \(20140317\))/CreationDate(D:20150501120749+01'00')>>
<</Pages 11 0 R/Type/Catalog>>
[417[251]421[257]424[368]443[470]445[355]450[380]480[322]498[480 233]505[461]508[256]514[326]520[264]]
<</Type/Font/Subtype/CIDFontType2/BaseFont/GAKHDJ+#28AH)#20Manal#20Black/FontDescriptor 14 0 R/CIDSystemInfo<</Registry(Adobe)/Ordering(Identity)/Supplement 0>>/DW 199/W 15 0 R>>
<</Type/FontDescriptor/Ascent 529/Descent -415/StemV 109/CapHeight 529/AvgWidth 392/FontBBox[-112 -321 1006 1137]/ItalicAngle 0/Flags 6/Style<</Panose<000000000000000000000000>>>/FontName/GAKHDJ+#28AH)#20Manal#20Black/FontFile2 16 0 R/CIDSet 17 0 R>>
[39[693]41[522]51[535]108[415]124[415]388[218 926]402[1213]406[541]446[586]1886[317]1992[229]2016[366]2021[366]2105[244]2108[244]2139[1006]2150[295]2162[378]2227[379 452]2272[589]2294[176]2300[198]2308[389]2339[343]2356[723]2359[1079]2397[552]2413[346]2457[177]2491[299]2912[349]2952[219]2969[209]2973[148]2976[302]2981[341]3027[168]3149[550]3297[259]3325[292]3726[248]3732[319]3853[411]3893[179]4021[55]4323[104]4627[560]5068[238]5106[476]5322[159]5328[222]6366[93]]
<</Type/Font/Subtype/CIDFontType2/BaseFont/HTCSVQ+Amiri-Regular/FontDescriptor 18 0 R/CIDSystemInfo<</Registry(Adobe)/Ordering(Identity)/Supplement 0>>/DW 190/W 19 0 R>>
<</Type/FontDescriptor/Ascent 1123/Descent -635/StemV 87/CapHeight 1123/AvgWidth 685/FontBBox[-581 -900 11467 1815]/ItalicAngle 0/Flags 6/Style<</Panose<000000000500000000000000>>>/FontName/HTCSVQ+Amiri-Regular/FontFile2 20 0 R/CIDSet 21 0 R>>
<</Type/Font/Subtype/Type0/BaseFont/GAKHDJ+#28AH)#20Manal#20Black/Encoding/Identity-H/DescendantFonts[4 0 R]/ToUnicode 12 0 R>>
<</Type/Font/Subtype/Type0/BaseFont/HTCSVQ+Amiri-Regular/Encoding/Identity-H/DescendantFonts[6 0 R]/ToUnicode 13 0 R>>
endstream
The more often I use PeePDF.py, the more I love it. Thanks, Jose Miguel, for this wonderful tool!
I usually use pdftocairo to fix that:
pdftocairo corruptedinfile.pdf -pdf outfile.pdf
After that, ghostscript can handle it properly.
Related
GhostScript version: 9.53.3
I need to stamp a pdf with (Entwurf) for all pages.
<<
/EndPage
{
gsave
/Helvetica-Bold 120 selectfont .5 .setfillconstantalpha .45 setgray
130 430 moveto 30 rotate (Entwurf) show -30 rotate
130 130 moveto 30 rotate (Entwurf) show -30 rotate
grestore
true
} bind
>> setpagedevice
Without .5 .setfillconstantalpha this script runs successfully. I have used 0.5 .setopacityalpha before but it crashes the same way and is deprecated since 9.53.
cat start.pdf | gs -q -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=/workspace/test.pdf Watermark_def.ps -
Error: /undefined in /--.endpage--
Operand stack:
(/tmp/gs_KvXeWS) --nostringval-- --dict:7/16(L)-- false --dict:4/6(L)-- 2 0 2 0.5
Execution stack:
%interp_exit .runexec2 --nostringval-- .endpage --nostringval-- 2 %stopped_push --nostringval-- .endpage .endpage false 1 %stopped_push 1990 1 3 %oparray_pop 1989 1 3 %oparray_pop 1977 1 3 %oparray_pop .endpage 1978 3 3 %oparray_pop .endpage .endpage 2 1 12 .endpage %for_pos_int_continue .endpage 1938 4 7 %oparray_pop .endpage 1820 6 7 %oparray_pop .endpage 9 .endpage
Dictionary stack:
--dict:731/1123(ro)(G)-- --dict:1/20(G)-- --dict:80/200(L)-- --dict:80/200(L)-- --dict:133/256(ro)(G)-- --dict:320/325(ro)(G)-- --dict:31/32(L)--
Current allocation mode is local
GPL Ghostscript 9.53.3: Unrecoverable error, exit code 1
That isn't a crash. It's a graceful exit on an error condition.
.setfillconstantalpha is a non-standard PostScript extension, PostScript does not support transparency (except in some limited fashions) so you can't use alpha blending in a normal PostScript program.
Ghostscript, because of the needs of PDF, does support transparency in the graphics library, and some limited support was added to the PostScript language to enable its use. This is decidedly non-standard and will not work on any other PostScript interpreter.
However, the use of these extensions is dependent on various operations happening in the correct sequence; get it wrong and it may crash. So by default, you can't use them in order to prevent malicious programs crashing the interpreter (or worse).
If you want to use these extensions (and take the risk of things blowing up) then you need to specifically allow them. This is documented, see the online documentation here and look for -dALLOWPSTRANSPARENCY. Also here.
I was trying to convert jpg to pdf using ghost script. The command which I was using was
C:\Users\kbged\Desktop\XAMPP\php\bin\gs\gs9.54.0\bin\gswin64c -sDEVICE=pdfwrite -sOutputFile=C:\Users\kbged\Pictures\php48D4.pdf C:\Users\kbged\Desktop\XAMPP\php\bin\gs\gs9.54.0\lib\viewjpeg.ps -c (C:\Users\kbged\Pictures\group_image.jpg) viewJPEG
However, this end with following error:
GPL Ghostscript 9.54.0 (2021-03-30)
Copyright (C) 2021 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Error: /stackunderflow in --dup--
Operand stack:
Execution stack:
%interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval--
Dictionary stack:
--dict:732/1123(ro)(G)-- --dict:0/20(G)-- --dict:78/200(L)-- --dict:8/20(L)--
Current allocation mode is local
GPL Ghostscript 9.54.0: Unrecoverable error, exit code 1
I tried to look into that viewjpeg.ps file and almost towards the end I see
% This lets you do stuff on the command line like:
% gs -sDEVICE=pdfwrite -o stuff%03d.pdf viewjpeg.ps -c "(image.jpg) << /PageSize 2 index viewJPEGgetsize 2 array astore >> setpagedevice viewJPEG"
% so the output size matches the original image.
I actually don't understand where the error lies...
Any help would be appreciated! Thanks!
There are several ways to convert image to PDF and this is possibly not the best way to convert Jpeg to PDF in Ghostscript, but you asked why it did not work for you.
The following should in a cmd file, first work per the example, once proven then try changing the <> of the last line that you think need altering. Note that long lines are very fragile and there must be NO space after ^
Also the order and combinations can easily throw GS into an error state.
CMD Filenames are best always "FullPath\quoted" and if you don't want to inititally CD to the output directory (to make it much easier to avoid using escaped name) then change the start of last line to
-c "("C:\\Users\\kbged\\Pictures\\group_image.jpg") <<......
Beware the much simpler method would be to write your own input.PS to avoid potential command line truncation.
#echo off
REM switch to folder of jpeg
cd "C:\Users\kbged\Pictures"
"C:\Users\kbged\Desktop\XAMPP\php\bin\gs\gs9.54.0\bin\gswin64c.exe" -dNOSAFER -sDEVICE=pdfwrite ^
-o "C:\Users\kbged\Pictures\php48D4.pdf" "C:\Users\kbged\Desktop\XAMPP\php\bin\gs\gs9.54.0\lib\viewjpeg.ps" ^
-c "(group_image.jpg) << /PageSize 2 index viewJPEGgetsize 2 array astore >> setpagedevice viewJPEG"
I'm trying to remove the vectors from some pdf files. Ghostscript (gs) works fine with -dFILTERVECTOR option:
gswin64c -o "test_out.pdf" -sDEVICE=pdfwrite -dFILTERVECTOR "test.pdf"
but when I run this command on large pdf files (larger than 100MB with more than 1000 pages), I get such error with a blank pdf file as an output:
Page 1139
Page 1140
**** Error: can't process embedded font stream,
attempting to load the font using its name.
Output may be incorrect.
Querying operating system for font files...
Substituting font Courier for AVFCLE+CourierNewPSMT.
Can't find (or can't open) font file %rom%Resource/Font/NimbusMonoPS-Regular.
Can't find (or can't open) font file NimbusMonoPS-Regular.
Can't find (or can't open) font file %rom%Resource/Font/NimbusMonoPS-Regular.
Can't find (or can't open) font file NimbusMonoPS-Regular.
Didn't find this font on the system!
Unable to substitute for font.
**** Error reading a content stream. The page may be incomplete.
Output may be incorrect.
Error: /dictfull in --filter--
Operand stack:
--dict:7/15(L)-- --nostringval-- 9 F_2 26049 11 FontObject --dict:10/18(L)-- false --dict:4/12(L)-- --nostringval-- --nostringval--
Execution stack:
%interp_exit .runexec2 --nostringval-- filter --nostringval-- 2 %stopped_push --nostringval-- filter filter false 1 %stopped_push 1992 1 3 %oparray_pop 1991 1 3 %oparray_pop 1979 1 3 %oparray_pop 1980 1 3 %oparray_pop filter filter 1141 1 1277 filter %for_pos_int_continue 1983 1 7 %oparray_pop filter filter filter filter %array_continue filter filter filter filter filter %array_continue 1827 13 10 %oparray_pop
Dictionary stack:
--dict:734/1123(ro)(G)-- --dict:1/20(G)-- --dict:80/200(L)-- --dict:80/200(L)-- --dict:133/256(ro)(G)-- --dict:317/325(ro)(G)-- --dict:33/64(L)-- --dict:6/9(L)-- --dict:6/20(L)-- --dict:9/15(L)--
Current allocation mode is local
GPL Ghostscript 9.27: Unrecoverable error, exit code 1
Unrecoverable error: VMerror in --.systemvmSFD--
Operand stack:
--nostringval-- --nostringval-- 0
GPL Ghostscript 9.27: ERROR: A pdfmark destination page 1277 points beyond the last page 1139.
It seems that the problem is related to a font issue on page 1140. but in fact, if I treat the file as 2 parts, each part works fine with no problem:
part1: pages from 1 to 1000
gswin64c -o "test_part1.pdf" -sDEVICE=pdfwrite -dFILTERVECTOR -sPageList=-1000 "test.pdf"
part2: from 1001 till the last page (around 1900)
gswin64c -o "test_part2.pdf" -sDEVICE=pdfwrite -dFILTERVECTOR -sPageList=1001- "test.pdf"
So, If I understood well, it seems that it's more related to the number of pages or the size of the pdf file
The pdf files generating the above results are private ones, so I can't upload them. But I have created a 175MB test pdf file (click here to download) which give a similir issue:
Page 1345
**** Error reading a content stream. The page may be incomplete.
Output may be incorrect.
**** Error: File did not complete the page properly and may be damaged.
Output may be incorrect.
Page 1346
*** ERROR: The font BCDEEE+Calibri is damaged and cannot be used. Switching to a
last-ditch fallback, text may not render correctly, or at all.
**** Error reading a content stream. The page may be incomplete.
Output may be incorrect.
**** Error: File did not complete the page properly and may be damaged.
Output may be incorrect.
Page 1347
**** Error: can't process embedded font stream,
attempting to load the font using its name.
Output may be incorrect.
Substituting font Helvetica for BCDEEE+Calibri.
**** Error reading a content stream. The page may be incomplete.
Output may be incorrect.
**** Error: File did not complete the page properly and may be damaged.
Output may be incorrect.
Page 1348
Error: /VMerror in --filter--
VM status: 4 43671928 45257592
Current allocation mode is local
Last OS error: 2
GPL Ghostscript 9.27: Unrecoverable error, exit code 1
Any idea to solve this issue knowing that I'm using the latest version of Ghostscript 9.27 64bit on Windows 10 ?
I 'suspect' that the problem is not related to the use of -dFILTERVECTOR. What happens if you try leaving that off the command line ?
You should also try the most recent Ghostscript code (not yet released) which addresses what may be a related problem.
This commit addressed this bug report which is similar to what you report here. I suspect you haev simply exhausted memory (or at least, memory addressable by Ghostscript).
[EDIT]
Having tested the file, it exhausts memory for me, using 2GB, after 17452 pages (and doesn't require the FILTERVECTOR switch, as I expected).
There is no solution to this. The pdfwrite device requires to keep a significant amount of content in memory in order to keep processing performance reasonable.
In addition your file embeds a new copy of each font on every page. Even though each of those fonts has the same name, we must treat each as a unique font. Failing to do so would mean that we might use the wrong font.
So your file has 1980 pages, and 5 fonts on every page, so that's 9,900 fonts. I strongly suspect that the overhead of keeping all these font copies in memory is what is consuming so much. A quick esitmate (looking at the decompressedd font stream size) would be that the fonts alone would occupy some 792MB of memory. Once you add the Encoding, Widths array, etc this could easily be the major source of memory usage.
I am trying to fix my pure black colored text problem in PDF conversion. I am able to convert my sRGB pdf to CMYK pdf but text colors are not kept in pure black. I already tried to use ghostscript: convert PDF into CMYK preserving pure Black for text trick but even I compiled "apple_..." files in my current directory, I always get :
gsicc_open_search(): Could not find ~/temp/AdobeCPs/CMYK/apple_to_jNP_neutrals.icc .
the command I am using:
gs -q -sDEVICE=pdfwrite -o sample.pdf -sColorConversionStrategy=CMYK
-sSourceObjectICC=control.txt test.pdf
My output is :
./base/gsicc_manage.c:1088: gsicc_open_search(): Could not find Graphic_RGB apple_to_jNP_neutrals.icc 0 1 0
+ ./base/gsicc_manage.c:660: gsicc_set_srcgtag_struct(): setting of control.txt src obj color info failed
| ./base/gsicc_manage.c:2731: gs_setsrcgtagicc(): cannot find srctag file
While reading gs_lev2.ps:
Error: /unknownerror in --.setuserparams--
Operand stack:
(gs_res.ps\000gs_typ42.ps\000gs_cidfn.ps\000gs_cidcm.ps\000gs_fntem.ps\000gs_cidtt.ps\000gs_cidfm.ps\000gs_cmap.ps\000gs_setpd.ps\000gs_fapi.ps\000gs_typ32.ps\000gs_frsd.ps\000gs_ll3.ps\000gs_mex_e.ps\000gs_mro_e.ps\000gs_pdf_e.ps\000gs_wan_e.ps\000pdf...) (gs_lev2.ps) --dict:31/31(G)-- --dict:1/1(G)-- --dict:1/1(G)--
Execution stack:
%interp_exit --nostringval-- --nostringval-- %loop_continue --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push --nostringval-- --nostringval--
Dictionary stack:
--dict:798/1123(G)-- --dict:71/200(L)-- --dict:798/1123(G)-- --dict:133/251(G)-- --dict:21/25(L)--
Last OS error: No such file or directory
Current file position is 24631
Unrecoverable error: syntaxerror in --nostringval--
Operand stack:
gs_res.psgs_typ42.psgs_cidfn.psgs_cidcm.psgs_fntem.psgs_cidtt.psgs_cidfm.psgs_cmap.psgs_setpd.psgs_fapi.psgs_typ32.psgs_frsd.psgs_ll3.psgs_mex_e.psgs_mro_e.psgs_pdf_e.psgs_wan_e.pspdf_ops.psgs_l2img.pspdf_rbld.pspdf_base.pspdf_draw.pspdf_font.pspdf_main.pspdf_sec.psgs_cff.psgs_mgl_e.psgs_ttf.psgs_icc.psgs_dps.psgs_dpnxt.psgs_epsf.psgs_pdfwr.ps gs_lev2.ps --nostringval-- --nostringval-- --nostringval-- false
Unrecoverable error: undefined in .uninstallpagedevice
Operand stack:
gs_res.psgs_typ42.psgs_cidfn.psgs_cidcm.psgs_fntem.psgs_cidtt.psgs_cidfm.psgs_cmap.psgs_setpd.psgs_fapi.psgs_typ32.psgs_frsd.psgs_ll3.psgs_mex_e.psgs_mro_e.psgs_pdf_e.psgs_wan_e.pspdf_ops.psgs_l2img.pspdf_rbld.pspdf_base.pspdf_draw.pspdf_font.pspdf_main.pspdf_sec.psgs_cff.psgs_mgl_e.psgs_ttf.psgs_icc.psgs_dps.psgs_dpnxt.psgs_epsf.psgs_pdfwr.ps gs_lev2.ps --nostringval-- --nostringval-- --nostringval-- false
My control.txt file is :
Image_RGB apple_to_jNP_photo.icc 0 1 0
Graphic_RGB apple_to_jNP_neutrals.icc 0 1 0
Text_RGB apple_to_jNP_neutrals.icc 0 1 0
echo $GS_LIB
/usr/share/ghostscript/9.18/Resource
ls /usr/share/ghostscript/9.18/Resource
CIDFont CIDFSubst CMap ColorSpace Decoding Encoding Font IdiomSet Init SubstCID
current directory :
~/temp/AdobeCPs/CMYK$ ls
AppleRGB.icc CoatedGRACoL2006.icc JapanColor2003WebCoated.icc USWebCoatedSWOP.icc
apple_to_jNP_neutrals.icc control.txt JapanWebCoated.icc USWebUncoated.icc
apple_to_jNP_photo.icc JapanColor2001Coated.icc sample.pdf WebCoatedFOGRA28.icc
CoatedFOGRA27.icc JapanColor2001Uncoated.icc test.pdf WebCoatedSWOP2006Grade3.icc
CoatedFOGRA39.icc JapanColor2002Newspaper.icc UncoatedFOGRA29.icc WebCoatedSWOP2006Grade5.icc
Could you help ?
Even though this is a quite old thread it seems not yet solved.
I had the same error as mentioned above with blanks as separators in the control.txt instead of tabs (as mentioned in the Ghostscript version Color Management WhitePaper).
Changing to tabs fixed the problem for me.
Your command line and input file isn't relevant to the problem, the failure is during startup:
| ./base/gsicc_manage.c:2731: gs_setsrcgtagicc(): cannot find srctag
file While reading gs_lev2.ps:
gs_lev2.ps is part of Ghostscript's startup code. That's all executed before you even get to the point of reading the command line options.
You haven't said which version of Ghostscript you are using, on which OS, or where you got it from, but it looks to me like you version is fundamentally broken.
I infer from your text that you are using version 9.18, that's 5 versions (2.5 years) out of date, the current version is 9.23. I'd suggest you get the vanilla Ghostscript source from the downloads page and compile that.
I am parsing Type3 glyphs fonts from Pdf to postscript. The input file have inline image with data streams flate decode filter applied.the filter has predictor 15.
Any body can help how I take the image streams form pdf to postscript.
This is how the input stream is given in pdf
32 0 obj
<<
/Length 342
>>
stream
37 0 4 -52 33 -1 d1
0.01 0 0 0.01 0 0 concat
gsave 2900 0 0 -5100 400 -100 concat
BI
/IM true
/W 29
/H 51
/BPC 1
/D[1
0]
/F/Fl
/DP<</Predictor 15
/Columns 29>>
ID xœ=Ì¡
Â`ÅñÿeÂLθ n`0>Ù`ñ
f[¦DŒF_ÁhC1ì%Ä)¶o.¢Ÿ"†ßá†s®àì]^ÏŠÅS³tFËÂÚ3sç'Æi èÐÇ:j‹¹¨åìOTÿ ª•ÉÙÕÅŸ¨‡¹Ó$°ÆΚWèÁ!¯Cê
÷0&f µtðV ©Ë÷iôíتÄ~Ø•Œöí&´« +ro#Ê‚ûÏÅùlßG'
EI gRestore
endstream
endobj
And here is what i am trying to write in output in Postscript
/g21 {
37 0 4 -52 33 -1 setcachedevice
q
[0.01 0 0 0.01 0 0] concat
q
[2900 0 0 -5100 400 -100] concat
[ xœ…ѱNÃ0à3©p'l` ¢abä*‰'#‚W`KP¡00öQ`d# ¨CWž€u`‰štj4Ü]# /ù¤œíÿ| ÂìÊüå7úŠ‰V'‚ª¦zò¡9à*´º
m1Õ`ñ—íü‹‡½Gù#ãÝAVxc¥Ž®"6oFܬJHÃB3(æod¾…xFP†o$!v±Ã»·0—gØY÷J$û„`´#zÊ
Oí¼œÑ¸é`Ê}ü…ñ.Z¯›cF4\¡*O¤ÑPÒYòî¦/éG‘qÑç¼2>öq<Üœ<
B˜5‚²¢ºÎ/èqUTUàoÓ9͔Π܉ä²z ‡S×ÛÙC(PA²š7èT¾ŽCGÈRaLéåksnˆÃ0z<zø:ž=
]
0
<<
/ImageType 1
/Width 29
/Height 51
/ImageMatrix [29 0 0 -51 0 51]
/BitsPerComponent 1
/Decode [1 0]
/DataSource { 2 copy get exch 1 add exch }
<</Predictor 15
/Columns 29
>>
/FlateDecode filter
>>
imagemask
pop pop
gRestore
gRestore
} def
PostScript has mostly the same filters as PDF. You don't need to decompress the data, just use the FlateDecode filter in PostScript and leave the compressed data untouched.
Note you'll need Language Level 3 for Predictor 15 (or any other PNG predictor) but that shouldn't be a problem, level 3 has been the standard for 18 years.
Otherwise you'll need to implement a version of the FlateDecode filter which supports the PNG Predictor. I believe zlib is quite capable of this.
[EDIT]
Your 'PostScript output' is incomplete, you are using PDF operators (q and Q) which you have not provided a definition for. Apart from anything else this makes it impossible to run the code through an interpreter. Kindly supply a complete simple example file, as requested. Not pasted code, I'm not inclined to go and create a file myself, and besides, binary doesn't cut and paste at all well.
Off the top of my head from desk checking I can't immediately see a problem, but since I can't run the code, I could easily be missing something.
[EDIT 2]
And that file, unsurprisingly, works fine.
You haven't supplied the PostScript file that you are creating. Its rather hard for me to tell what's wrong with the PostScript you created by looking at the PDF file you started with.
You could, of course, use Ghostscript (and I see you've used it to create the PDF file) to create a PostScript file, and then look at what that contains. If you set -dCompressFonts=false then the output font won't even be compressed.
For example:
37 0 4 -52 33 -1 d1
0.01 0 0 0.01 0 0 cm
q 2900 0 0 -5100 400 -99.9998 cm
BI
/IM true
/W 29
/H 51
/BPC 1
/D[1
0]
/F[/A85
/CCF]
/DP[null
<</K -1
/Columns 29>>]
ID
-D=,M5m+t^0_>op8\HM"Du]KKrr2rthqG/5qU_ik]$f$TlUslD91qoN93j0%dckk:ld^*DV25!+
!WX>~>
EI Q
Of course you'll need to look at the prolog to see how all the procedures used there are defined, but you can do that yourself, you certainly don't need me to do it. Notice that the imagemask uses the CCITTFax and ASCII85 decode filters, its trivial to add additional filters. Since the data is guaranteed to be 'monochrome' (its a mask) the CCITT filter generally gives superior compression to Flate.
Note that if you are really using Ghostscript 9.05 then you should upgrade, that is 6 years old.
It might possibly help if you were to explain why you want to take an ugly, bitmapped, type 3 font from PDF and make an ugly, bitmapped type 3 PostScript font from it.
[EDIT 3]
well looking at your PostScript file, the definition of the glyphs does not match what you've put in your question. The actual content looks like this:
/g10135{
88 0 4 -70 82 8 setcachedevice
q
[
0.01 0 0 0.01 0 0 ] M
q
[7800 0 0 -7800 400 800 ]M
<<
/ImageType 1
/Width 78
/Height 78
/ImageMatrix [ 78 0 0 -78 0 78]
/BitsPerComponent 1
/Decode [1
0]
/DataSource ....binary data.....
<< /Predictor 15
/Columns 78
/BitsPerComponent 1>>
/FlateDecode filter def
>> imagemask
Q
Q
}bind def
You have not supplied either a file, procedure or string source as a value for the DataSource key in the dictionary. Essentially, the PostScript interpreter reads and tokenises the /DataSource key, and then proceeds to process the binary as PostScript. Unsurprisingly this causes an error 'syntaxerror in (binary token, type=156)' when processed with Ghostscript.
If you had got past that then you would have discovered that the filter operator takes a data source as well and you haven't supplied one for that either.
So you need to create a data source for your binary data. Up to you how you do that but currentfile is one way. Or readstring given that you know the string length.
So something like:
<<
/ImageType 1
/Width 29
/Height 51
/ImageMatrix [29 0 0 -51 0 51]
/BitsPerComponent 1
/Decode [1 0]
/DataSource
<length> string dup
currentfile exch readstring
.....binary data.....
<<
/Predictor 15
/Columns 29
>> /FlateDecode filter
>> imagemask
Obviously you'll have to fill in yourself by knowing the string length. The dictionary argument to FlateDecode looks to me like it shouldn't be needed.
[Edit 4]
I notice that this is appears to be intended for commercial use. Nothing wrong with that, but I'm not going to do all your homework for you, if its your job its up to you to learn the language well enough to do the job.
I'm skipping lightly over the actual implementation details below in an attempt to outline where you are going wrong. In practice things are a little more complex, I haven't discussed how the procedure stored in the CharStrings dictionary is created, or the difference with early name binding (which is an important concept in PostScript).
Your existing code is:
/g10135{
88 0 4 -70 82 8 setcachedevice
q
[
0.01 0 0 0.01 0 0 ] M
q
[7800 0 0 -7800 400 800 ]M
<<
/ImageType 1
/Width 78
/Height 78
/ImageMatrix [ 78 0 0 -78 0 78]
/BitsPerComponent 1
/Decode [1
0]
/DataSource {417 string dup
currentfile exch readstring}
...binary data....
<< /Predictor 15
/Columns 78
>>/FlateDecode filter def
>> imagemask
Q
Q
}bind def
So, the PostScript interpreter reads those bytes one at a time, and converts them into tokens. This either results in an executable token, which is executed, or an operation on one of the stacks.
So /g10135 is terminated by the { character, because that's a reserved character. The / introduces a name object, so we end up with the name object g10135 which we push on to the operand stack. The { character introduces an executable array so we put a mark on the operand stack.
Next we read 88, terminated by a white space character. That's a numeric so we store that on the operand stack, likewise the other numbers. The operand stack now contains:
/g10135
mark
88
0
4
-70
82
8
We then read setcachedevice, which is terminated by a white space. That isn't a standard token so the interpreter starts looking through the dictionaries on the dictionary stack, looking for a definition. Since it is a standard operator, we find it in systemdict and execute it. That consumes 6 operands from the operand stack, it has no other effects (actually it does, but this is a bit special because we are executing inside a font, but we'll ignore that for now).
Next we encounter a q, again this is looked up in every dictionary on the dictionary stack to find a definition. This is defined in your own prolog as a gsave, so it takes no operands and returns no operands, it simply saves the graphics state, incrementing the save depth by 1.
I'm not going to go through the rest it would be tedious, however, eventually we reach your /DataSource, this is a name, so we push it on the operand stack. The next thing we encounter is a { that's a procedure definition so we push a mark on the operand stack. We then encounter a 417 so we push that, string, dup, currentfile, exch and readstring, so our stack looks like:
/DataSource
mark
417
string
dup
currentfile
exch
readstring
Then we get the character } That is the closing mark for an executable array, so we create the array and push it onto the operand stack:
/DataSource
{....}
Then we return to the procedure and continue executing it. The next thing we find is some binary data so we try to execute that as PostScript binary tokens. Because it isn't valid the interpreter throws an error.
Just creating an executable array is not sufficient to actually execute it. If you look at the outline code I posted at the end of edit 3 above you will note that I did not put the readstring and so on in an executable array, I simply allowed the interpreter to execute that code immediately.
By doing so the readstring acts on currentfile (the actual PostScript program in this case) and reads bytes of data from the current point in that file. The current point will be immediately after consuming the white space which terminates the readstring, ie the actual binary data. The readstring operator reads enough bytes from the file to fill the string, leaving the string on the operand stack. The file pointer has moved on to the byte after the binary data, and the interpreter resumes token scanning at that point. So it then creates the FilterParams dictionary puts the /FlateDecode name on the stack and then executes the filter operator which consumes the name, the dictionary and the string operands, returning a file object. That file object then becomes the value associated with the DataSource key in the image dictionary which is passed to the imagemask operator.
While I haven't tested that code, its basically correct. There are of course other ways to achieve the same aim.
That's basically about as far as I'm prepared to go with this, you need to go and look at what I've written and compare it with your own program.
Note that the simplest way to investigate this is to take the contents of the CharProc (excluding the setcachedevice) and just run that as a PostScript program.