I'm working with a bunch of PDF files, some of which have been scanned at a bit of an angle. Adobe Acrobat allows me to rotate PDF files by 90 or 180 degrees. But is there a way to rotate a PDF just a few degrees - just enough to make it straighter?
I could perhaps take a screenshot, open it in Photoshop and rotate it, then somehow convert the Photoshop file to a PDF. However, that seems like a really clumsy process.
PDF supports for complete pages only /Rotate values of 90 degrees, because that is (of course) simple. What you need to do is rotate the contents, not the page. So you need to use something which can remake the PDF file for you.
You could use either Ghostscript or MuPDF to do this. Either will require some coding:
MuPDF will require coding in C,
Ghostscript will require you to do some PostScript programming.
Using Ghostscript you would need to define a BeginPage procedure which rotates the content by a small amount and moves the origin of the content slightly as well (because the rotation rotates around the origin, which is at the bottom left, not the centre).
Here is a short utility script for rotating pages (written in Perl). It converts each page of the input PDF to a PDF XObject Form, rotates the form, then outputs the rotated page.
#! /usr/bin/perl
use warnings; use strict;
use PDF::API2;
use Getopt::Long;
my $degrees = 3;
my $scale = 1.0;
my $x = 0;
my $y = 0;
GetOptions ("rotate=i" => \$degrees, "scale=f" => \$scale, "x=f" => \$x, "y=f" => \$y)
or die "usage: $0 IN_PDF OUT_PDF --rotate=DEG --scale=ALPHA --x=POINTS --y=POINTS";
my $infile = shift (#ARGV);
my $outfile = shift (#ARGV);
my $pdf_in = PDF::API2->open($infile);
my $pdf_out = PDF::API2->new;
foreach my $pagenum (1 .. $pdf_in->pages) {
my $page_in = $pdf_in->openpage($pagenum);
#
# create a new page
#
my $page_out = $pdf_out->page(0);
my #mbox = $page_in->get_mediabox;
$page_out->mediabox(#mbox);
my $xo = $pdf_out->importPageIntoForm($pdf_in, $pagenum);
#
# lay up the input page in the output page
# note that you can adjust the position and scale, if required
#
my $gfx = $page_out->gfx;
$gfx->rotate($degrees);
$gfx->formimage($xo, $x, $y, $scale);
}
$pdf_out->saveas($outfile);
You'll need to ensure the PDF::API2 and Geopt::Long modules are installed from CPAN.
The script by default rotates 3 degrees anticlockwise, this is configurable vi the --rotate options.
There are also -x, -y and --scale options to allow fine adjustments of the positioning and scale of the output pages.
This question has also been asked on unix.stackexchange.com .
Another option is using LaTeX:
\documentclass{standalone}
\usepackage{graphicx}
\begin{document}
\includegraphics[angle=-1.5]{odd-scan}
\end{document}
In this case, I have the file odd-scan.pdf (a slightly rotated one page scan) in the same folder as the LaTeX file rotated.tex with the content above and then I run pdflatex rotated.tex. The output is a file rotated.pdf with the PDF rotated by 1.5 degrees clockwise.
(I assume a *nix-style environment. On Windows, you can follow these instructions in Cygwin, although I think you might have to build MuPDF from source there as it doesn't appear to be in the Cygwin repos. If you don't want to do that and you're okay with rasterizing the PDF, ImageMagick is in the Cygwin repos and can do the whole job if needed—see below.)
MuPDF's mutool utility can do this. Say you have a PDF file rotate_me.pdf and you want a version of it rotated by 20° clockwise written to a file rotated.pdf:
#!/bin/bash
mutool draw -R 20 -o rotated.pdf rotate_me.pdf
(mutool draw docs)
You can also rasterize the PDF using mutool convert, work with the image files, and then create a new PDF from them (this assumes rotate_me.pdf has between a hundred and a thousand pages—edit the %3d to your liking):
#!/bin/bash
# - for whatever reason convert's `rotate` is counter-clockwise
# - %nd is replaced with the page number
mutool convert -O rotate=-20 -o 'rotated_%3d.png' rotate_me.pdf
(mutool convert docs)
Once you've done whatever else you need to do the image files and you're ready to turn them back into a PDF, you can use ImageMagick:
#!/bin/bash
magick convert $(ls | grep -P 'rotated_[0-9]{3}\.png') rotated_finished.pdf
(If you get an error saying the security policy for PDFs doesn't permit this, you may need to edit /etc/ImageMagick-7/policy.xml and comment out or remove the <policy domain="coder" rights="none" pattern="PDF" /> line. Be aware of this Ghostscript pre-v9.24 vulnerability which that security policy may be intended to mitigate. If you're working with files you made yourself, you should be safe here, but you may want to re-enable this policy afterwards depending on your needs and environment. If you're not working with files you made yourself, especially PDFs, be careful, whether you have a pre-v9.24 Ghostscript installed or not. PDF as a format is very complex and offers many different places to squirrel away maliciousness, and practically speaking you can never be 100% confident that the software you're using to work with it is perfectly hardened.)
ImageMagick can also rasterize PDFs on its own, although it's a bit more complicated. For example:
#!/bin/bash
magick convert -density 150 -rotate 20 rotate_me.pdf rotated.pdf
This might look similar to the mutool draw command, but the difference is that ImageMagick will rasterize the input PDF and then use the resulting images to make the output PDF, so you can use all the regular ImageMagick transformations with this command.
Anyway, -density is for DPI. It will default to 72 DPI if you don't pass that argument, which is likely to not look very good. Also, ImageMagick doesn't seem to be quite as smart as MuPDF about margins and things like that as far as PDFs go, so you may need to do more work with it than this to get reasonable output for your use case. If you do have access to both MuPDF and ImageMagick, I think doing the rasterization with MuPDF and then doing further work on the resulting images with ImageMagick tends to give the nicest results with the least work, but of course that may or may not be practical for you.
(magick convert docs)
Rasterization has obvious disadvantages if your PDF is vector-based—increased file size, fixed resolution, loss of flexibility, etc. Also, even if your PDF is already storing raster graphics, you may lose text data or the like from it in the conversion. If the PDF is really horrible, though, sometimes this is the least painful approach. You can OCR it if needed once you've cleaned it up using Tesseract, often with superior results to whatever may have been done before you arrived.
This can be done with cpdf:
cpdf -rotate-contents 5 in.pdf -o out.pdf
(Rotates around the centre of the page by five degrees)
I had this at one time. I don't know how many pages there are that you have.
What I did is print the pages that wear off use a paper cutter to square them up and rescanned them. Hope this helps.
And yes I've try to find some type of program to fix this and I still have not found one .
Related
I want to convert SONY raw files (.ARW) to jpg with imagemagick.
But there is a problem with the whitebalance (probably).
When I open the files in ACDSee or XNView, they look like the jpg-version off the camera, but when I open them in imagemagick Display, they are much darker and more reddish.
Obviously there are informations about color in the RAW file, but imagemagick cannot interprete them. Is there any way to extract those informations and apply them separately?
I am in the process of writing a tool to automatically download and publish fotos from the camera, therefore I tried imagemagick.NET (AnyCPU, v11.1) - the conversion program works fine, but the color-problem is the same.
Converted with imagemagick:
Converted with XnView (or any other graphics utility)
For anyone coming across this: according to Fred Weinhaus' comment I added this to my VB
Dim settings As New MagickReadSettings
settings.Format = MagickFormat.Arw
settings.SetDefine(MagickFormat.Arw, "use-camera-wb", "true")
Using Image As New MagickImage(input, settings)
I'm trying to convert a page of a PDF to an image. I'm successful with most PDF's I've tried with but this one in particular always ends up with a lot of whitespace on one side or strange scaling.
I've tried every combination of every fixed media, fixed resolution, fit page, use crop/bleed/trim/art box, etc. parameter to fix the issue but nothing does it. The best I get is the right content size but offset and chopped off.
Here's what it should look like, according to every PDF reader I've tried:
Here's a link to the PDF (8 MB) for testing.
https://drive.google.com/file/d/1ErS3KxADb1YAdzM7FG7T5dO8QnW4l1AQ/view?usp=sharing
Edit 1:
Here's what it looks like using just -dUseCropBox without a cropbox override:
I'm using Ghostscript.NET with very simple code. I create a rasterizer, call Ope(PDF file, ghostscript dll in bytes), then GetPage(DPI, page number). To use other flags I add a custom switch to the rasterizer before calling open
using(var rasterizer = new GhostscriptRasterizer()) {
//rasterizer.CustomSwitches.Add("-dFIXEDMEDIA");
//rasterizer.CustomSwitches.Add("-dFIXEDRESOLUTION");
//rasterizer.CustomSwitches.Add("-dPSFitPage");
//rasterizer.CustomSwitches.Add("-dFitPage");
//rasterizer.CustomSwitches.Add("-dPDFFitPage");
//rasterizer.CustomSwitches.Add("-dUseCropBox");
//rasterizer.CustomSwitches.Add("-dPrinted");
//rasterizer.CustomSwitches.Add("-dUseBleedBox");
//rasterizer.CustomSwitches.Add("-dUseTrimBox");
//rasterizer.CustomSwitches.Add("-dUseArtBox");
//rasterizer.CustomSwitches.Add("-sPAPERSIZE=letter");
//rasterizer.CustomSwitches.Add("-dORIENT1=true");
//etc
rasterizer.Open(pdfFilePath, ghostscriptDLL);
img = rasterizer.GetPage(dpi, pageNumber);
img.Save(pageFilePath, imageFormat);
}
I'll try again with the latest version of just ghostscript (no .NET) and see if that makes a difference.
Edit 2:
Using just gswin64c version 9.55.0 and -dUseCropBox works as KenS said. Since I don't need Ghostscript.NET to do that, that's a good resolution.
Using just gswin64c version 9.55.0 and -dUseCropBox works as KenS said. Since I don't need Ghostscript.NET to do that, that's a good resolution.
I'm using rst2pdf to collect several images (named A1.png, A2.png, ... etc) from images folder into one pdf file.
to include one image I write the following in file.txt
.. image:: images/A1.png
then run the following in Linux terminal to convert to pdf
cat file.txt | rst2pdf -o file.pdf
is there a way to include all images at once using the name pattern, something like "images/*.png"?
Thank you
I'm not sure if I quite understand what you are trying to do, do you want to convert your images into PDFs? For that I recommend you could try ImageMagick's convert tool https://imagemagick.org/index.php
If you need to include all images in one PDF, then create an rst file with an image directive for each of the images to include, and rst2pdf will produce a PDF with all the images (or any other restructuredtext content) in it.
Im trying to use Ghostscript to create a PDF with multiple identical pages. I will later use this together with another multipaged PDF to stamp on unique information onto every page.
Is it possible to use Ghostscript to create such a PDF and keep the size of the final file down? Maby there is a flag that i have not noticed that can do this in a better way than the script below?
I have tried to use a regular merge command like the one below but the size of the resulting PDF grows alot and the original file size of 2,061MB merged to a 100page pdf results in a final size of 46,117MB.
"C:\Program Files\gs\gs9.20\bin\gswin64.exe"^
-dBATCH^
-dNOPAUSE^
-q^
-sDEVICE=pdfwrite^
-sOutputFile=outputpdf.pdf^
"inputpdf.pdf"^
"inputpdf.pdf"^
"inputpdf.pdf"(and so on 100 times)
You can construct such a file manually easily enough, which is much smaller, by reusing the page content stream for each page.
However Ghostscript's pdfwrite device won;t do that, not least because it can't. It cannot know in advance that the page its about to receive is the same as the previous page. As a result it will create a new page content stream for each page, and create new content for it.
Note that resources (forms, patterns, colour spaces, image XObjects etc) which are used on each page will be reused on other pages.
However, it seems to me that you're already getting nearly a 5:1 ratio (2k * 100 pages = 200Kb, the final file is 46Kb) though in fairness a good bit of that 2Kb is 'stuff' around the page.
Without seeing your input file I can't really comment any further, but frankly I doubt its possible to make it any smaller without hand-crafting the file. What's the problem with a 46Kb file anyway ?
I want to use beamer to project slides onto one screen and my notes onto a second screen. Beamer's show notes on second screen option is designed for this purpose. It requires the pgfpages package, and it is supposed to create PDF pages of ordinary height but twice the ordinary width, so that half of the page can be projected onto one screen, half onto the other.
The option works as intended when I use pdflatex. But when I use xelatex (from MikTeX 2.9), I get pages of only the normal width. The pages are my normal slides; my "note" slides are not created. Here is an example:
\documentclass{beamer}
\usepackage{pgfpages}
\setbeameroption{show notes on second screen=right}
\begin{document}
\begin{frame}{Note test}
\begin{itemize}
\item<1-> Eggs
\item<2-> Plants
\note[item]<2>{Tell joke about plants.}
\end{itemize}
\end{frame}
\end{document}
When I use pdflatex, this code produces a PDF file of double width, with note slides on the right. When I use xelatex, it produces a PDF file of normal width, and no note slides are included. Changing the first line to \documentclass[xelatex]{beamer} makes no difference.
Is there anything that I can do to make the show notes on second screen option work with xelatex?
I am using beamer 3.27 and pgfpages 0.02 (which is distributed with v3.0 of the pgf package).
Adding these lines solves the problem:
\renewcommand\pgfsetupphysicalpagesizes{%
\pdfpagewidth\pgfphysicalwidth\pdfpageheight\pgfphysicalheight%
}
Credit to Tomáš Janoušek, who provided the answer in this post to the XeTeX mailing list: http://www.tug.org/pipermail/xetex/2009-June/013325.html.