Why does this Ink Annotation show a curve instead of straight lines? - pdf

I created an ink annotation in the form of the letter Z, as shown above, however, the corners of the Z are rounded. Why does this happen? How can I avoid this additional beautification so that the corners stay sharp and the points are connected via a straight line?
PDF code:
%PDF-1.6
%μῦ
1 0 obj
<</Type/Catalog/Pages 2 0 R>>
endobj
2 0 obj
<</Type/Pages/Kids[3 0 R]/Count 1>>
endobj
3 0 obj
<</Type/Page/Parent 2 0 R/Resources<<>>/MediaBox[0 0 500 800]/Annots[4 0 R]>>
endobj
4 0 obj
<</Type/Annot/Subtype/Ink/Contents(<enter description here>)/InkList[[150 300 250 300 150 200 250 200]]/Rect[200 250 300 150]/P 3 0 R/F 4/C[1 0 0]>>
endobj
xref
0 5
0000000000 65536 f
0000000017 00000 n
0000000063 00000 n
0000000115 00000 n
0000000209 00000 n
trailer
<</Size 5/Root 1 0 R>>
startxref
374
%%EOF

I'm not sure what's going on here, and have no time to investigate right now.
But here are some immediate observations as additional data points:
Mac OS X's Preview.app shows the Z-shape with sharp edges.
Adobe Reader X and Adobe Acrobat X Pro (on Mac OS X) show the Z-shape with round edges.
Ghostscript v9.05 shows the Z-shape with round edges.
Ghostscript, self-compiled from today's Git repository, shows the Z-shape with round edges.
Update:
Ok, I had a quick look into the official ISO spec for PDF-1.7. It says this about the /Subtype /Ink annotations' /InkList:
An array of n arrays, each representing a stroked path. Each array shall be a series of alternating horizontal and vertical coordinates in default user space, specifying points along the path. When drawn, the points shall be connected by straight lines or curves in an implementation-dependent way.
(from Chapter 12.5.6.13 Ink Annotations, my emphasis)
So, it is completely 'legal' that one implementation shows straight lines, and the other one shows curves. :-(
Sigh...
Update 2:
So if you want to force the Z-shape to appear as straight lines for all implementations, you need to draw 3 different straight lines, each one represented by a different array, and put these 3 arrays into container array...
Change this part of your code:
/InkList
[
[150 300 250 300 150 200 250 200]
]
to this:
/InkList
[
[150 300 250 300]
[250 300 150 200]
[150 200 250 200]
]
and your Z-Shape will show sharp corners.

The solution is to create also the annotation appearance (the /AP entry in the annotation dictionary). If the annotation appearance, which draws the straight lines, is present in the PDF file then it will be used when displaying the file and you will get the same result in any viewer. If the appearance is not present then the viewer will construct the appearance based on annotation definition and in your case this viewer built appearance is implementation dependent.

Related

Postscript: Rescaling and resizing page at the same time

For a plotting project of mine I was following the How-to by Eric Weeks to rescale my post-script page so that one unit matches 1 cm of length. The header of my PS file looks like this:
%!PS
matrix currentmatrix /originmat exch def
/umatrix {originmat matrix concatmatrix setmatrix} def
[28.3465 0 0 28.3465 0 0] umatrix
It does the job but the other thing I need is to resize the page from US Letter to A4. According to, e.g., Postscript - document size I should setpagedevice with something like:
<< /PageSize [595 842] >> setpagedevice
However I cannot make it work. When I put it in front of the matrix redefinition, it takes no effect. When I put it after the matrix redefinition (even translating the new size to cm) it resets the matrix scale and the page still comes out as US Letter, only the drawing is scaled down because the coordinates are now in pts.
How can I both rescale the page and define its size?
Edit: I am attaching a MWE presenting my problem. This draws a rectangle that would plot a box around the page, leaving a 10-mm margin if the page was A4-sized. On a letter-sized medium it shows how the paper is shorter but wider.
%!PS
matrix currentmatrix /originmat exch def
/umatrix {originmat matrix concatmatrix setmatrix} def
[28.3465 0 0 28.3465 0 0] umatrix
0.020000 setlinewidth
1.0 1.0 moveto 1.0 27.7 lineto 20.0 27.7 lineto 20 1.0 lineto 1.0 1.0 lineto
stroke
Is there another call of setpagedevice in the file which sets the letter format? The call executed last will win for the page size, and setting up the matrix differently has do be done after that call since it resets all graphic settings when setting up the page device.
If you cannot find where it is being called, you could redefine the setpagedevice operator in userdict. That might work or not, depending on how the whole PostScript file is constructed. In the redefinition you would create a new dict operand to setpagedevice, copy all entries but replace any PageSize by your desired value.
This might do:
userdict /setpagedevice {
dup length dict begin
{
1 index /PageSize eq {
pop [595 842]
} if
def
} forall
currentdict
end
setpagedevice
} bind put

How to extract rotation/transformation information for PDF extracted images (i.e. How does viewers know to rotate 180 )

I am using a ScanSnap scanner which generates PDF-1.3 where it will auto-correct the orientation (rotate 0 or 180 degrees) of scanned documents when the PDF is viewed within Adobe Reader. OCR is done by the scanning software and I am assuming the orientation is determined then and encoded into the PDF.
Note that I know I can use Tesseract or other OCR tools to determine if rotation is needed, but I do not want to use it as the scanner software seems to have already determined it and telling PDF viewers if rotation is needed (or not).
When I use image extraction tools (like xpdf pdfimages, python libraries) it does not properly rotate jpeg images 180 degrees (if needed).
NB: pdfimages extracts the raw image data from the PDF file, without
performing any additional transforms. Any rotation, clipping, color
inversion, etc. done by the PDF content stream is ignored.
I have scanned a document twice with rotation (0 degrees, and 180 degrees).
I cannot seem to reverse engineer what is telling Adobe/Foxit to rotate (or not) the image when viewing. I have looked at the PDF-1.3 specification doc, and compared the PDF binary data between the orientation-corrected and not-corrected. I can not determine what is correcting the orientation?
No /Page/Rotate (defaults to 0) in PDF
No EXIF orientation in JPEG
I do not see any transformation matrix (cm operator) in PDF
In both cases the PDF binary looks like the following (stopped at the JPEG streamed data)
UPDATED: links to PDF files rotated-180 rotated-0
%PDF-1.3
%âãÏÓ
1 0 obj
<</Metadata 20 0 R/Pages 2 0 R/Type/Catalog>>
endobj
2 0 obj
<</MediaBox[0.0 0.0 606.6 794.88]/Count 1/Type/Pages/Kids[4 0 R]>>
endobj
4 0 obj
<</Parent 2 0 R/Contents 18 0 R/PieceInfo<</PSL<</Private<</V(3.2.9)>>/LastModified(D:20190201125524-00'00')>>>>/MediaBox[0.0 0.0 606.6 794.88]/Resources<</XObject<</Im0 5 0 R>>/Font<</C0_0 11 0 R/T1_0 16 0 R>>/ProcSet[/PDF/Text/ImageC]>>/Type/Page/LastModified(D:20190201085524-04'00')>>
endobj
5 0 obj
<</Subtype/Image/Length 433576/Filter/DCTDecode/Name/X/BitsPerComponent 8/ColorSpace/DeviceRGB/Width 1685/Height 2208/Type/XObject>>stream
Does anyone know how PDF viewers know to rotate an image 180 (or not). Is it meta-data within the PDF or JPEG image which can be extracted? Does Adobe and other viewers do something dynamically on opening a document to determine if orientation correction is needed?
I'm no expert with PDF specification. But I was hoping someone may have already found a solution to this problem.
The image Im0 in the resources of the page in "internetfile-180.pdf" is not rotated:
But the image Im0 in the resources of the page in "internetfile.pdf" is rotated:
In the viewer both look upright, so in "internetfile.pdf" a technique must be used that rotates the image.
There are two major techniques for this:
Setting the Rotate property of the page accordingly, i.e. here to 180.
Applying a rotation transformation to the current transformation matrix in the content stream of the page.
Let's look at the page dictionary first, a bit pretty-printed:
4 0 obj
<<
/Parent 2 0 R
/Contents 13 0 R
/PieceInfo
<<
/PSL
<<
/Private <</V (3.2.9)>>
/LastModified (D:20190204142537-00'00')
>>
>>
/MediaBox [0.0 0.0 608.64 792.24]
/Resources
<<
/XObject <</Im0 5 0 R>>
/Font <</T1_0 11 0 R>>
/ProcSet [/PDF /Text /ImageC]
>>
/Type /Page
/LastModified (D:20190204102537-04'00')
>>
As we see, there is no Rotate entry present. Thus, we'll have to look at the page content stream. According to the page dictionary it's in object 13, generation 0.
That object is a stream object with deflated stream data:
13 0 obj
<<
/Length 4014
/Filter /FlateDecode
>>
stream
H‰”WÛŽÛF}Ÿ¯Ð[lÀÓÓ÷˾e½
[...]
ÿüòÛÿ ´ß
endstream
endobj
After inflating the stream data, they start like this:
q
-608.3999939 0 0 -792.9600067 608.3999939 792.9600067 cm
/Im0 Do
Q
[...]
And this is indeed an application of the second technique, the cm instruction applies the rotation and the Do instruction paints the image with the rotation active!
In detail, the cm instruction applies the affine transformation represented by the matrix
-608.3999939 0 0
0 -792.9600067 0
608.3999939 792.9600067 1
In other words:
x' = -608.3999939 * x + 608.3999939
y' = -792.9600067 * y + 792.9600067
This transformation actually is a combination of a rotation by 180°, a horizontal scaling by 608.3999939 and a vertical scaling by 792.9600067, and a translation by 608.3999939 horizontally and 792.9600067 vertically.
The Do instruction now paints the image. Here one needs to know that this instruction first scales the image to fit into the unit 1×1 square at the origin and then applies the current transformation matrix.
Thus, the image is drawn rotated by 180°, effectively filling the whole 608.64×792.24 MediaBox of the page.
mkl answered the question correctly doing all the hard work decoding the PDF for me.
I thought I would add in my python (PyPDF2) code to search for the found rotation condition in case it helps someone else.
input1 = PyPDF2.PdfFileReader(open(filepath, "rb"))
totalPages = input1.getNumPages()
for pgNum in range(0,totalPages):
page0 = input1.getPage(pgNum)
# Lets look to see if the page contains a transformation matrix to rotate it 180 degress
# (ScanScap iX500 encoded the PDF with a cm transformation matrix to rotate 180 degrees in PDF viewers
# #see https://stackoverflow.com/questions/54483013/how-to-extract-rotation-transformation-information-for-pdf-extracted-images-i-e
# #see 'PDF 1.3 Reference Manual March 11, 1999' Section 3.10 Transformation matrices which is applied to the scanned image
# [[a b 0]
# [c d 0]
# [e f 1]]
isPageRotated180 = False
pgContent = page0['/Contents'].getData().decode('utf-8')
FLOAT_REG = '([-+]?\d*\.\d+|\d+)'
m = re.search( '{} {} {} {} {} {} cm'.format(FLOAT_REG,FLOAT_REG,FLOAT_REG,FLOAT_REG,FLOAT_REG,FLOAT_REG), pgContent )
if m:
(a,b,c,d,e,f) = list(map(float,m.groups()))
isPageRotated180 = (a == -e and d == -f)

Create Highlight PDF annotations with Ghostscript

I have the following PostScript file containing a pdfmark to create a highlight annotation:
%PS
/Courier 30 selectfont
15 15 moveto
(Test)show
[ /Rect [0 0 80 30]
/Subtype /Highlight
/Color [.8 .8 0]
/QuadPoints [10 40 90 40 10 10 90 10]
/Contents (Test annotation)
/ANN pdfmark
showpage
(Note that the coordinates of the /QuadPoints field are not in the order the specs define, as Adobe implements it differently.)
Ghostscript creates a PDF with an annotation from that, but there are two issues:
It works in Adobe Reader and Okular, but it's not clickable in Evince.
More important: The highlighted area isn't a rectangle but has rounded left and right edges, as can be seen from the following screenshot:
Why is that and how can I get straight edges?
You should start by looking at the content of the PDF file and seeing what Ghostscript (or more accurately the pdfwrite device) has put in there. Posting an example PDF file to look at would be a sensible move too, and would also tell us which version of Ghostscript you are using.
BTW that header should be %!PS, you missed off the '!'. Of course since its a comment it doesn't matter to the PostScript interpreter.
Now here's the output from Adobe Acrobat Distiller for the annotation, using the code in your question:
1 0 obj
<</Type/Annot/Subtype/Highlight/Rect[0 0 80 30]/C[.8 .8 0]/QuadPoints[10 40 90 40 10 10 90 10]/Contents(Test annotation)>>
endobj
And here's the same from Ghostscript's pdfwrite device:
8 0 obj
<</Type/Annot
/Rect [0 0 80 30]
/C [0.8 0.8 0]
/QuadPoints [10 40 90 40 10 10 90 10]
/Contents(Test annotation)
/Subtype/Highlight>>endobj
These are essentially identical.
So to answer your questions:
If it works in Acrobat, then perhaps you should ask the Evince developers this question.
The rounded edges are drawn by the application which reads the PDF annotation. Since Acrobat draws them that way, everyone else does the same (including Ghostscript's PDF interpreter). If you don't like it you will have to change the viewing application.

PDF Low-Level: Drawing a line in the content object?

I have searched extensively online and I have the PDF specification in which I have looked, yet I still can't figure out how to draw a simple black line on a PDF page from the content object's instructions (stream).
Let's say I just want to draw a 1-pixel thickness (assuming 72 dpi) black line at x 400, y 100-300.
This should in theory be a very simple operation, but the PDF spec goes on and on about all kinds of fancy things and appears to forget to explain how I would go about performing this simple operation.
Please can someone point me in the right direction?
In the PDF specification, have a look at chapter 8 (Graphics) and in there section 8.5, Path Construction and Painting.
To draw a simple straight path, you need a "move to" operation followed by a "line to" operation:
400 100 m
400 300 l
You can then stroke the path using the S operator so your code becomes
400 100 m
400 300 l
S
By default the color is black so you've already gotten a black line :-) But if you want to make sure you have to set some parameters in the graphics state.
0 G
1 w
400 100 m
400 300 l
S
The first line now sets the color space to "gray" and puts the shade of grey to 0 (black). The following line sets the line width of your stroked line to 1 user unit (what this comes out as is dependent on your current transformation matrix.
You can apply a neat trick if you really want 1 pixel (please don't for production files though!) and that is to set the width to zero:
0 w
This gives you "the thinnest line that can be rendered at device resolution: 1 device pixel wide".

Rotating a PDF file by n degrees, where n is not a multiple of 90

The problem I am facing is as following. I have a source document, src.pdf.
I need to insert the contents of src.pdf into target.pdf, rotated by n degrees, where n is NOT a multiple of 90.
Any help would be appreciated, thanks.
EDIT 1:
PDF contains no annotations.
I can use any solution which relies on utilities, or write my own code, preferably in C#/Python/Ruby/Perl, but not limited to a language.
The platform is Windows Server 2008 R2, I prefer to stick to the existing server but Linux is also an option. Latest (stable) GhostScript and pdftk are already installed.
If a new language is not a problem, LateX could be an option. You can include a pdf as a figure in a tex file, and you will be able to use dedicated option like rescaling and rotating function. Then, compile it to obtain a new pdf.
The very simple following code works for me :
\documentclass[a4paper]{article}
\usepackage{graphicx}
\begin{document}
\includegraphics[scale=0.5,angle=10]{test.pdf}
\end{document}
From this pdf:
I get this new one:
It will however need some manual ajustements to get exactly what you want...
You can do it with TexLive like this:
\documentclass{article}
\usepackage{pdfpages}
\begin{document}
\includepdf[pages={-},angle=30]{main}
\end{document}
It will rotate the entire pdf - every page!
I'm not the one who figured this out, however - check this thread for the original solution (and give that fellow a point!)
This is an example showing how to do that using Java and the iText library. With minimal changes that code should be usable with C# and iTextSharp, too, giving the sample #neo could not provide on short notice in his answer.
The sample takes the first page ofsource.pdfand inserts it intotarget.pdfin all multiples of 30°, i.e. of 2*pi/12, but as that angle is explicitly given in the code, you can rotate by any angle.
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("target.pdf"));
document.open();
PdfReader origPdfReader = new PdfReader("source.pdf");
PdfImportedPage importedPage = writer.getImportedPage(origPdfReader, 1);
PdfContentByte canvas = writer.getDirectContent();
for (int i = 0; i < 12; i++)
{
AffineTransform transform = AffineTransform.getRotateInstance(Math.PI * i / 6.0,
importedPage.getWidth() / 2, importedPage.getHeight() / 2);
canvas.addTemplate(importedPage, transform);
document.newPage();
}
document.close();
Depending on your use case you may not only want to rotate (as you asked for) but also to scale it down to fit the page. In that case simply addtransform.scale(scaleX, scaleY)before using thetransform.
Since you do not have to deal with annotations, you could try using any PDF library of your choice that allows you to decompose PDF dictionaries and decode the page content. Once you get the page content, you can insert a transformation matrix at the beginning of the page: [ cos θ sin θ −sin θ cos θ 0 0 ]
I would recommend taking a look at the PDF Reference Document from Adobe, specifically the section about the transformation matrix.
For example if you have the following page content object (40 0 obj):
10 0 obj % Page object
<< /Type /Page
/Parent 5 0 R
/Resources 20 0 R
/Contents 40 0 R
>>
endobj
40 0 obj % Page content
BT
/F1 1 Tf
12 0 0 12 100 600 Tm
(Hello) Tj
ET
endobj
And you want to rotate the whole page by 45 degrees, assuming cos(45)=sin(45)=0.7, your resulting page content will be:
40 0 obj
0.7 0.7 -0.7 0.7 0 0 cm
BT
/F1 1 Tf
12 0 0 12 100 600 Tm
(Hello) Tj
ET
endobj
After you finish adding the transformation matrix, you can re-compose your PDF file. The library you have chosen should then add compression filters and encoding filters as needed.
iText for example can decompose and recompose PDF files. See the method PdfReader.getPageContent for details.
I wrote some software which can do this:
cpdf -rotate-contents 45 in.pdf -o out.pdf
Commercial, I'm afraid. See Chapter 3 of the manual.