Postscript: Rescaling and resizing page at the same time - resize

For a plotting project of mine I was following the How-to by Eric Weeks to rescale my post-script page so that one unit matches 1 cm of length. The header of my PS file looks like this:
%!PS
matrix currentmatrix /originmat exch def
/umatrix {originmat matrix concatmatrix setmatrix} def
[28.3465 0 0 28.3465 0 0] umatrix
It does the job but the other thing I need is to resize the page from US Letter to A4. According to, e.g., Postscript - document size I should setpagedevice with something like:
<< /PageSize [595 842] >> setpagedevice
However I cannot make it work. When I put it in front of the matrix redefinition, it takes no effect. When I put it after the matrix redefinition (even translating the new size to cm) it resets the matrix scale and the page still comes out as US Letter, only the drawing is scaled down because the coordinates are now in pts.
How can I both rescale the page and define its size?
Edit: I am attaching a MWE presenting my problem. This draws a rectangle that would plot a box around the page, leaving a 10-mm margin if the page was A4-sized. On a letter-sized medium it shows how the paper is shorter but wider.
%!PS
matrix currentmatrix /originmat exch def
/umatrix {originmat matrix concatmatrix setmatrix} def
[28.3465 0 0 28.3465 0 0] umatrix
0.020000 setlinewidth
1.0 1.0 moveto 1.0 27.7 lineto 20.0 27.7 lineto 20 1.0 lineto 1.0 1.0 lineto
stroke

Is there another call of setpagedevice in the file which sets the letter format? The call executed last will win for the page size, and setting up the matrix differently has do be done after that call since it resets all graphic settings when setting up the page device.
If you cannot find where it is being called, you could redefine the setpagedevice operator in userdict. That might work or not, depending on how the whole PostScript file is constructed. In the redefinition you would create a new dict operand to setpagedevice, copy all entries but replace any PageSize by your desired value.
This might do:
userdict /setpagedevice {
dup length dict begin
{
1 index /PageSize eq {
pop [595 842]
} if
def
} forall
currentdict
end
setpagedevice
} bind put

Related

Images rotated when added to PDF in itext7

I'm using the following extension method I built on top of itext7's com.itextpdf.layout.Document type to apply images to PDF documents in my application:
fun Document.writeImage(imageStream: InputStream, page: Int, x: Float, y: Float, width: Float, height: Float) {
val imageData = ImageDataFactory.create(imageStream.readBytes())
val image = Image(imageData)
val pageHeight = pdfDocument.getPage(page).pageSize.height
image.scaleAbsolute(width, height)
val lowerLeftX = x
val lowerLeftY = pageHeight - y - image.imageScaledHeight
image.setFixedPosition(page, lowerLeftX, lowerLeftY)
add(image)
}
Overall, this works -- but with one exception! I've encountered a subset of documents where the images are placed as if the document origin is rotated 90 degrees. Even though the content of the document is presented properly oriented underneath.
Here is a redacted copy of one of the PDFs I'm experiencing this issue with. I'm wondering if anyone would be able to tell me why itext7 is having difficulties writing to this document, and what I can do to fix it -- or alternatively, if it's a potential bug in the higher level functionality of com.itextpdf.layout in itext7?
Some Additional Notes
I'm aware that drawing on a PDF works via a series of instructions concatenated to the PDF. The code above works on other PDFs we've had issues with in the past, so com.itextpdf.layout.Document does appear to be normalizing the coordinate space prior to drawing. Thus, the issue I describe above seems to be going undetected by itext?
The rotation metadata in the PDF that itext7 reports from a "good" PDF without this issue seems to be the same as the rotation metadata in PDFs like the one I've linked above. This means I can't perform some kind of brute-force fix through detection.
I would love any solution to not require me to flatten the PDF through any form of broad operation.
I can talk only about the document you`ve shared.
It contains 4 pages.
/Rotate property of the first page is 0, for other pages is 270 (defines 90 rotation counterclockwise).
IText indeed tries to normalize the coordinate space for each page.
That`s why when you add an image to pages 2-4 of the document it is rotated on 270 (90 counterclockwise) degrees.
... Even though the content of the document is presented properly oriented underneath.
Content of pages 2-4 looks like
q
0 -612 792 0 0 612 cm
/Im0 Do
Q
This is an image with applied transformation.
0 -612 792 0 0 612 cm represents the composite transformation matrix.
From ISO 32000
A transformation matrix in PDF shall be specified by six numbers,
usually in the form of an array containing six elements. In its most
general form, this array is denoted [a b c d e f]; it can represent
any linear transformation from one coordinate system to another.
We can extract a rotation from that matrix.
How to decompose the matrix you can find there.
https://math.stackexchange.com/questions/237369/given-this-transformation-matrix-how-do-i-decompose-it-into-translation-rotati
The rotation is defined by the next matrix
0 -1
1 0
This is a rotation on -90 (270) degrees.
Important note: in this case positive angle means counterclockwise rotation.
ISO 32000
Rotations shall be produced by [rc rs -rs rc 0 0], where rc = cos(q)
and rs = sin(q) which has the effect of rotating the coordinate system
axes by an angle q counter clockwise.
So the image has been rotated on the same angle in the counter direction comparing to the page.

How to extract rotation/transformation information for PDF extracted images (i.e. How does viewers know to rotate 180 )

I am using a ScanSnap scanner which generates PDF-1.3 where it will auto-correct the orientation (rotate 0 or 180 degrees) of scanned documents when the PDF is viewed within Adobe Reader. OCR is done by the scanning software and I am assuming the orientation is determined then and encoded into the PDF.
Note that I know I can use Tesseract or other OCR tools to determine if rotation is needed, but I do not want to use it as the scanner software seems to have already determined it and telling PDF viewers if rotation is needed (or not).
When I use image extraction tools (like xpdf pdfimages, python libraries) it does not properly rotate jpeg images 180 degrees (if needed).
NB: pdfimages extracts the raw image data from the PDF file, without
performing any additional transforms. Any rotation, clipping, color
inversion, etc. done by the PDF content stream is ignored.
I have scanned a document twice with rotation (0 degrees, and 180 degrees).
I cannot seem to reverse engineer what is telling Adobe/Foxit to rotate (or not) the image when viewing. I have looked at the PDF-1.3 specification doc, and compared the PDF binary data between the orientation-corrected and not-corrected. I can not determine what is correcting the orientation?
No /Page/Rotate (defaults to 0) in PDF
No EXIF orientation in JPEG
I do not see any transformation matrix (cm operator) in PDF
In both cases the PDF binary looks like the following (stopped at the JPEG streamed data)
UPDATED: links to PDF files rotated-180 rotated-0
%PDF-1.3
%âãÏÓ
1 0 obj
<</Metadata 20 0 R/Pages 2 0 R/Type/Catalog>>
endobj
2 0 obj
<</MediaBox[0.0 0.0 606.6 794.88]/Count 1/Type/Pages/Kids[4 0 R]>>
endobj
4 0 obj
<</Parent 2 0 R/Contents 18 0 R/PieceInfo<</PSL<</Private<</V(3.2.9)>>/LastModified(D:20190201125524-00'00')>>>>/MediaBox[0.0 0.0 606.6 794.88]/Resources<</XObject<</Im0 5 0 R>>/Font<</C0_0 11 0 R/T1_0 16 0 R>>/ProcSet[/PDF/Text/ImageC]>>/Type/Page/LastModified(D:20190201085524-04'00')>>
endobj
5 0 obj
<</Subtype/Image/Length 433576/Filter/DCTDecode/Name/X/BitsPerComponent 8/ColorSpace/DeviceRGB/Width 1685/Height 2208/Type/XObject>>stream
Does anyone know how PDF viewers know to rotate an image 180 (or not). Is it meta-data within the PDF or JPEG image which can be extracted? Does Adobe and other viewers do something dynamically on opening a document to determine if orientation correction is needed?
I'm no expert with PDF specification. But I was hoping someone may have already found a solution to this problem.
The image Im0 in the resources of the page in "internetfile-180.pdf" is not rotated:
But the image Im0 in the resources of the page in "internetfile.pdf" is rotated:
In the viewer both look upright, so in "internetfile.pdf" a technique must be used that rotates the image.
There are two major techniques for this:
Setting the Rotate property of the page accordingly, i.e. here to 180.
Applying a rotation transformation to the current transformation matrix in the content stream of the page.
Let's look at the page dictionary first, a bit pretty-printed:
4 0 obj
<<
/Parent 2 0 R
/Contents 13 0 R
/PieceInfo
<<
/PSL
<<
/Private <</V (3.2.9)>>
/LastModified (D:20190204142537-00'00')
>>
>>
/MediaBox [0.0 0.0 608.64 792.24]
/Resources
<<
/XObject <</Im0 5 0 R>>
/Font <</T1_0 11 0 R>>
/ProcSet [/PDF /Text /ImageC]
>>
/Type /Page
/LastModified (D:20190204102537-04'00')
>>
As we see, there is no Rotate entry present. Thus, we'll have to look at the page content stream. According to the page dictionary it's in object 13, generation 0.
That object is a stream object with deflated stream data:
13 0 obj
<<
/Length 4014
/Filter /FlateDecode
>>
stream
H‰”WÛŽÛF}Ÿ¯Ð[lÀÓÓ÷˾e½
[...]
ÿüòÛÿ ´ß
endstream
endobj
After inflating the stream data, they start like this:
q
-608.3999939 0 0 -792.9600067 608.3999939 792.9600067 cm
/Im0 Do
Q
[...]
And this is indeed an application of the second technique, the cm instruction applies the rotation and the Do instruction paints the image with the rotation active!
In detail, the cm instruction applies the affine transformation represented by the matrix
-608.3999939 0 0
0 -792.9600067 0
608.3999939 792.9600067 1
In other words:
x' = -608.3999939 * x + 608.3999939
y' = -792.9600067 * y + 792.9600067
This transformation actually is a combination of a rotation by 180°, a horizontal scaling by 608.3999939 and a vertical scaling by 792.9600067, and a translation by 608.3999939 horizontally and 792.9600067 vertically.
The Do instruction now paints the image. Here one needs to know that this instruction first scales the image to fit into the unit 1×1 square at the origin and then applies the current transformation matrix.
Thus, the image is drawn rotated by 180°, effectively filling the whole 608.64×792.24 MediaBox of the page.
mkl answered the question correctly doing all the hard work decoding the PDF for me.
I thought I would add in my python (PyPDF2) code to search for the found rotation condition in case it helps someone else.
input1 = PyPDF2.PdfFileReader(open(filepath, "rb"))
totalPages = input1.getNumPages()
for pgNum in range(0,totalPages):
page0 = input1.getPage(pgNum)
# Lets look to see if the page contains a transformation matrix to rotate it 180 degress
# (ScanScap iX500 encoded the PDF with a cm transformation matrix to rotate 180 degrees in PDF viewers
# #see https://stackoverflow.com/questions/54483013/how-to-extract-rotation-transformation-information-for-pdf-extracted-images-i-e
# #see 'PDF 1.3 Reference Manual March 11, 1999' Section 3.10 Transformation matrices which is applied to the scanned image
# [[a b 0]
# [c d 0]
# [e f 1]]
isPageRotated180 = False
pgContent = page0['/Contents'].getData().decode('utf-8')
FLOAT_REG = '([-+]?\d*\.\d+|\d+)'
m = re.search( '{} {} {} {} {} {} cm'.format(FLOAT_REG,FLOAT_REG,FLOAT_REG,FLOAT_REG,FLOAT_REG,FLOAT_REG), pgContent )
if m:
(a,b,c,d,e,f) = list(map(float,m.groups()))
isPageRotated180 = (a == -e and d == -f)

Draw rectangle with Ghostscript (using PostScript language)

I'm trying to draw a rectangle and output it to a PDF using Ghostscript.
If I put the following PostScript code in a file named rect.eps, I get what I want:
newpath
100 100 moveto
0 100 rlineto
100 0 rlineto
0 -100 rlineto
-100 0 rlineto
closepath
gsave
0 0 0 setrgbcolor
fill
stroke
showpage
But if I try to include that PostScript into my Ghostscript-command, I just get a blank page:
gs -o rect.pdf -sDEVICE=pdfwrite -g300x300 -c "newpath 100 100 moveto 0 100 rlineto 100 0 rlineto 0 -100 rlineto -100 0 rlineto closepath gsave 0 0 0 setrgbcolor fill stroke showpage"
What am I doing wrong, shouldn't it be possible to draw a rectangle with Ghostscript?
Best Regards
Niclas
Stefan's comment is effectively correct.
You have set a media size in pixels of 300x300. Now given that the pdfwrite device's default resolution is 720 dpi, and you haven't changed that, this means that the media size is less than half an inch in each direction.
You have then drawn a rectangle, staring at 100,100 units on the page, and extending by 100 units in each direction. PostScritp units are 1/72 of an inch, so your rectangle's lower left corner begins at just over 1 inch up and right.
That's outside the half-inch square defined by your media, so the result is simply that the rectangle is drawn off the page.
If you don't set the media size Ghostscript will use its default, either A4 or Letter depending, and you will see the output. As to why it works when you make an EPS file, I have no idea, I expect there is content in the EPS that you haven't shared which is making a difference.
When creating a PDF file, which is a resolution-independent format, its better to specify the media size in resolution-independent units, like PostScript units, than pixels.
Note that your code has an additional problem, also mentioned by Stefan, the dangling gsave, which looks like it ought to have a grestore before the stroke. As it is the stroke will do nothing, I suspect you want:
gsave
0 0 0 setrgbcolor
fill
grestore
stroke
showpage

Change size PDF unproportionally by Ghostscript

I have PDF document with many pages 595x420 ppi but I need this pages push in 595x210 but all text must be visible.
So.. Can I change scale of PDF pages unproportionally (no zoom) to fit custom size of page with ghostscript or I must to use some another program?
If you want scaling applied to one axis and not the other, then you will have to do some PostScript programming. In /ghostpdl/Resource/Init/pdf_main.ps is the code which calculates the matrix required:
/pdf_PDF2PS_matrix { % <pdfpagedict> -- matrix
matrix currentmatrix matrix setmatrix exch
% stack: savedCTM <pdfpagedict>
dup get_any_box
% stack: savedCTM <pdfpagedict> /Trim|Crop|Art|MediaBox <Trim|Crop|Art|Media Box>
oforce_elems normrect_elems fix_empty_rect_elems 4 array astore
//systemdict /PDFFitPage known {
PDFDEBUG { (Fiting PDF to imageable area of the page.) = flush } if
That code calculates the x and y scale values and makes them the same. If you want them to differ, that's what you will have to modify. Note you will also have to set a specific media size using -dDEVICEHEIGHTPOINTS and -dDEVICEWIDTHPOINTS and set -dFIXEDMEDIA to prevent the PDF file resizing the media.

Rotating a PDF file by n degrees, where n is not a multiple of 90

The problem I am facing is as following. I have a source document, src.pdf.
I need to insert the contents of src.pdf into target.pdf, rotated by n degrees, where n is NOT a multiple of 90.
Any help would be appreciated, thanks.
EDIT 1:
PDF contains no annotations.
I can use any solution which relies on utilities, or write my own code, preferably in C#/Python/Ruby/Perl, but not limited to a language.
The platform is Windows Server 2008 R2, I prefer to stick to the existing server but Linux is also an option. Latest (stable) GhostScript and pdftk are already installed.
If a new language is not a problem, LateX could be an option. You can include a pdf as a figure in a tex file, and you will be able to use dedicated option like rescaling and rotating function. Then, compile it to obtain a new pdf.
The very simple following code works for me :
\documentclass[a4paper]{article}
\usepackage{graphicx}
\begin{document}
\includegraphics[scale=0.5,angle=10]{test.pdf}
\end{document}
From this pdf:
I get this new one:
It will however need some manual ajustements to get exactly what you want...
You can do it with TexLive like this:
\documentclass{article}
\usepackage{pdfpages}
\begin{document}
\includepdf[pages={-},angle=30]{main}
\end{document}
It will rotate the entire pdf - every page!
I'm not the one who figured this out, however - check this thread for the original solution (and give that fellow a point!)
This is an example showing how to do that using Java and the iText library. With minimal changes that code should be usable with C# and iTextSharp, too, giving the sample #neo could not provide on short notice in his answer.
The sample takes the first page ofsource.pdfand inserts it intotarget.pdfin all multiples of 30°, i.e. of 2*pi/12, but as that angle is explicitly given in the code, you can rotate by any angle.
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("target.pdf"));
document.open();
PdfReader origPdfReader = new PdfReader("source.pdf");
PdfImportedPage importedPage = writer.getImportedPage(origPdfReader, 1);
PdfContentByte canvas = writer.getDirectContent();
for (int i = 0; i < 12; i++)
{
AffineTransform transform = AffineTransform.getRotateInstance(Math.PI * i / 6.0,
importedPage.getWidth() / 2, importedPage.getHeight() / 2);
canvas.addTemplate(importedPage, transform);
document.newPage();
}
document.close();
Depending on your use case you may not only want to rotate (as you asked for) but also to scale it down to fit the page. In that case simply addtransform.scale(scaleX, scaleY)before using thetransform.
Since you do not have to deal with annotations, you could try using any PDF library of your choice that allows you to decompose PDF dictionaries and decode the page content. Once you get the page content, you can insert a transformation matrix at the beginning of the page: [ cos θ sin θ −sin θ cos θ 0 0 ]
I would recommend taking a look at the PDF Reference Document from Adobe, specifically the section about the transformation matrix.
For example if you have the following page content object (40 0 obj):
10 0 obj % Page object
<< /Type /Page
/Parent 5 0 R
/Resources 20 0 R
/Contents 40 0 R
>>
endobj
40 0 obj % Page content
BT
/F1 1 Tf
12 0 0 12 100 600 Tm
(Hello) Tj
ET
endobj
And you want to rotate the whole page by 45 degrees, assuming cos(45)=sin(45)=0.7, your resulting page content will be:
40 0 obj
0.7 0.7 -0.7 0.7 0 0 cm
BT
/F1 1 Tf
12 0 0 12 100 600 Tm
(Hello) Tj
ET
endobj
After you finish adding the transformation matrix, you can re-compose your PDF file. The library you have chosen should then add compression filters and encoding filters as needed.
iText for example can decompose and recompose PDF files. See the method PdfReader.getPageContent for details.
I wrote some software which can do this:
cpdf -rotate-contents 45 in.pdf -o out.pdf
Commercial, I'm afraid. See Chapter 3 of the manual.