Getting only the first few characters when converting from PDF to image with ImageMagick - pdf

I'm using ImageMagick in PHP in order to convert the first page of a PDF file written in Japanese into an image file.
My problem is that only the first few characters are displayed in the image.
I tried everything. The PDF file is correct and the code doesn't return any error or warning.
$im = new Imagick();
$im->setResolution(400,400);
$im->readimage($pdf_file.'[0]');
$im->setImageFormat('png');
$im->posterizeImage(1000,false);
$im->writeImage($image_file);
$im->clear();
$im->destroy();
The first line of the PDF file is :
委託者________(所在地:________)(以下「委託者」という)と受託者________(所在
But the image only displays:
委託者________(所在地:________)(以下「委託者

Related

failed to load an image when converting html to pdf with pandoc

I use jekyll to generate html files having an image like:
<img src="/assets/images/view.png" alt="" />
When I generate a PDF with pandoc with that HTML, it shows:
Warning: Failed to load file:///assets/images/view.png (ignore)
The resulting PDF doesn't contain the image.
I think that's because the image's path is absolute, it loads from the file system, absolute path. I have tried --resource-path=assets/images/ but doesn't help. Does anyone know how to load images successfully under this case?
There might be a simpler solution, but I'd solve this by using a Lua filter that fixes the image path:
function Image (img)
-- remove leading slash from image paths
img.src = img.src:gsub('^/', '')
return img
end
Save this to a file fix-img.lua and pass the file to pandoc via the --lua-filter option.

GROFF PDFPIC converted w ImageMagick to .ms document causes "troff: sample.ms:18: division by zero" and leads images to show very right of the pdf doc

I converted my original image to pdf with ImageMagick. If viewed independently, the pdf image looks perfectly normal.
sample.ms :
.PDFPIC Figure_1.pdf
Once I try to compile my .ms document with the following command:
groff -ms sample.ms -U -T pdf > sample.pdf
I get the following error from groff:
troff: sample.ms:1: division by zero
The document does compile but it looks like this: image is way to the right of the page to the point its sometimes almost completely out of the page.
I was having the same problem and it seems like the PDFs convert generates are corrupt in some way.
I ended up using convert img.png img.tiff and then tiff2pdf img.tiff > img.pdf. Including img.pdf then worked just fine.
I used tiff2pdf just because that's what I had installed, but any other program should work too if it generates valid PDF.

Writing a basic PostScript script by hand

I wanted to try and manually code a PostScript file. Why? Why not. From Wikipedia, I copied and pasted their basic Hello World program for PostScript which is:
%!PS
/Courier % name the desired font
20 selectfont % choose the size in points and establish
% the font as the current one
72 500 moveto % position the current point at
% coordinates 72, 500 (the origin is at the
% lower-left corner of the page)
(Hello world!) show % stroke the text in parentheses
showpage % print all on the page
When I try to open it in GIMP, I get
Opening 'Hello World.ps' failed. Could not interpret file 'Hello World.ps'
I can use ImageMagick to convert the file
convert "Hello World.ps" "Hello World.pdf"
convert "Hello World.ps" "Hello World.eps"
The PDF opens successfully and displays 'Hello World' in Courier.
The EPS yields the same error as the PS.
Is there something wrong with the syntax of the PS file?
Are PS files just not meant to be viewed directly, and should instead be viewed in a containing format like PDF?
Is GIMP just not able to handle this particular format of PS file?
To answer your questions, one by one:
You PostScript file is completely OK.
PostScript files can be viewed directly if you use a PostScript-capable viewer. (BTW: PDF may be regarded as a 'container format' -- but it never embeds a PostScript file for 'viewing'...)
For Gimp to be able and handle PS/EPS files, you need a working Ghostscript (installation link) on your system.
The same as point '3.' is true for your convert command: ImageMagick cannot handle PS/EPS or PDF input files unless there is a functional Ghostscript installation available on the local system. This would work as a so-called 'delegate', employed by ImageMagick to handle file formats which it cannot handle itself. A delegate converts such a format into a raster file, which ImageMagick in turn can then take over for further processing.
To check for available ImageMagick delegates, run these commands:
convert -list delegate
convert -list delegate | grep -Ei --color '(eps|ps|pdf)'

Add pdf file in Rmarkdown file

Is it possible to display a pdf file in an Rmarkdown file? Say, for example, a previously saved image myimage.pdf
What I am aiming is a form of
\includegraphics[width=0.5]{myimage.pdf}
An update from the very end of 2017
It is possible to import a PDF image using knitr::include_graphics(), e.g.
```{r label, out.width = "85%", fig.cap = "caption"}
include_graphics("path-to-your-image.pdf")
```
You can insert this directly into your R Markdown. The alternate name will only be displayed if the image does not load.
![Alternate Name](file.pdf){width=500px}
For the record and as suggested in the comments, this works perfectly fine as long as you're knitting to pdf (this is a LaTex command that won't work if you try to knit to HTML or docx though):
\includegraphics[width = \textwidth]{myimage.pdf}
You can change the width with something like
\includegraphics[width = 0.5\textwidth]{myimage.pdf}

Strange behaviour of a pdf-to-text conversion

I'm trying to convert a pdf document in .txt using pdftotext on a linux mint machine. The document is written in english but the output text result something like this:
23!,&/$!%+!,#$!AB&017"*&7!"-M')(!-)!gE*X/-&$!$-&23!')!,#$!
(-.$1!/*/-223!(/-&-)E ,$$*!,#-,!,#$!%,#$&!C2-3$&!>'22!($,!
,#$!-9[-0$),!0%&)$&7!S/0#!-!*',/-,'%)!'*!*$$)!')!V'-E
(&-.!Z7!I,!'*!##',$8*!,/&)1!D/,!)%!.-,,$&!>#$&$!##',$!(%$
*1!^2-0M!>'22!D$!-D2$!,%!+2'C!,#$ 9'*0!%)!,#$!gE*X/-&$!N(
KO!-)9!,-M$!,#$!#<!0%&
Is there an encoding problem? Maybe a wrong option in the command line?
Edit: the problem is the same even if I try to copy a bunch of text from the pdf document end paste it in a text document.
Edit #2: The Producer pdf property is: Mac OS X 10.5.6 Quartz PDFContext, the encoding for most of the fonts is WinAnsi or MacRoman. Maybe this can help.