Explanation of part of a PDF file

Explanation of part of a PDF file - pdf

This segment of a PDF file seems to cause Poppler to crash. Xpdf doesn't seem to choke on it. If I remove the /I1 Do and /I2 Do lines, the PDF file works fine. Can someone give me a quick explanation of those might be doing? Let me know if you need to see other parts of the PDF file.
1289 0 obj
<<
/Length 72
>>
stream
q
360.00 0 0 583.20 0 0 cm
/I1 Do
Q
q
360.00 0 0 583.20 0 0 cm
/I2 Do
Q
endstream
endobj

I1 and I2 are either images or form XObjects. Probably for some reason Poppler cannot decode their content and crashes. Even if I see the file I do not know the internals of Poppler so it is difficult to guess what it is causing the problem, unless it is an obvious error in the PDF structure.

Related

How to test ClamAV service for potential threats

As part of an enterprise software project, our application connects to an antivirus service backed by ClamAV, using ICAP as communication protocol. I would like to test the antivirus service response to malicious documents but, of course, I cannot use a document which is actually infected with something malicious. I found EICAR Anti Malware Testfile, but it only seems to come as either a .txt or a .zip and the system only allows upload of Word or PDF. The antivirus service only recognizes EICAR if it is send to it "as-is" but not when embedded inside a Word or PDF.
My question is: how can I create a Word and/or PDF document that is recognized by ClamAV as a threat despite it is actually not harmful at all?

I initially suggested
Since docx is a zip you could try rename eicar.zip as eicar.docx it proves only that a docx is reviewed/scanned similar to a zip, not that the AV can detect malicious VBA macros which would be a different payload.
However, the uploading step, involving Apache Tika file verification, blocked that simplistic approach, as the file type was not as expected.
My second suggestion was
Take a valid docx rename to zip drop the eicar text into it with explorer (or use zip add) and rename to docx as that's likely to bypass Tika checking.
Apparently that worked.
Likewise it should be possible to embed eicar.txt inside a PDF however detection again would not mean the av is scanning for JavaScript exploitation, just that the plain text signature is seen in a PDF file, thus only hints that a PDF is scanned.
This is more difficult due to PDF encryption, but with a hand crafted text file attachment in an editor, it may not be encoded, simply stored as plain text, sufficient basic for the eicar trigger to be seen.
It could look something like this but cut and pasting this binary shown as text will likely fail storage as eicar.pdf due to ansi line endings encoding. so grab a binary copy from link below
%PDF-1.4
%µ¶
1 0 obj
<</Pages 2 0 R/Type/Catalog>>
endobj
2 0 obj
<</Count 1/Kids[3 0 R]/Type/Pages>>
endobj
3 0 obj
<</Contents 4 0 R/MediaBox[0 0 500 800]/Parent 2 0 R/Resources<</Font<</F1 5 0 R>>>>/Type/Page>>
endobj
4 0 obj
<</Length 57>>
stream
q BT /F1 24 Tf 1 0 0 1 50 720 Tm (Hello World!) Tj ET Q
endstream
endobj
5 0 obj
<</BaseFont/Courier/Subtype/Type1/Type/Font>>
endobj
xref
0 6
0000000000 65536 f
0000000016 00000 n
0000000062 00000 n
0000000114 00000 n
0000000227 00000 n
0000000333 00000 n
trailer
<</Size 6/Root 1 0 R/ID[<89311A609A751F1666063E6962E79BD5><FDDAE606D8247DFCBA7D13E1833DEDE3>]>>
startxref
395
%%EOF
%X5O!P%#AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*
%%EOF
temporarily available from https://gofile.io/d/53fylg should look like this
assuming your antivirus allows download :-) try save download as text otherwise I will need to upload as RAR
However those two "Positives" would be just as good a detection as telltales that any AV is searching those file types for current known exploits.
I recommend download the live script running version bottom of this article for deeper testing.
https://blog.didierstevens.com/2015/08/28/test-file-pdf-with-embedded-doc-dropping-eicar/

PDF that renders in Chrome but not in Acrobat

%PDF-1.7
4 0 obj
<</Type/ObjStm/N 3/First 14/Length 139>>
stream
1 0 2 41 3 76 <</Type/Catalog/Version/1.7/Pages 2 0 R>><</Type/Pages/Kids[3 0 R]/Count 1>><</Type/Page/MediaBox[0 0 200 200]/Parent 2 0 R>>
endstream
endobj
5 0 obj
<<
/Root 1 0 R
/ID[<7F1FE2C507E6DB4CB0787E660F2B0C65><2450E4E8FF5FC84380428886C0DD4C2F>]
/Size 6
/Index[1 5]
/W[1 4 1]
/Type/XRef
/Length 68
/Filter[/ASCIIHexDecode]
>>
stream
020000000400
020000000401
020000000402
010000000A00
01000000E500
endstream
endobj
startxref
229
%%EOF
The PDF above opens in Chrome (or Edge), but in Adobe Acrobat (Reader) it crashes. Ghostscript regards it as fine too. Note that it assumes CRLF for line breaks.
I read the parts of the PDF spec that are relevant for a basic PDF, and it seems that the above syntax follows it. Why doesn't Adobe like it?
Here is a link to the PDF. Notice how it opens in Chrome, but crashes in Adobe Acrobat. (This PDF uses LF for line breaks, and has a Resources dictionary on the page, based on the comments.)

Acrobat has the following 2 quirks, both of which do not follow the specs:
If the XRef Stream has a single filter, an array must not be used. So /Filter[/FlateDecode] won't work, and /Filter/FlateDecode will. This may apply to any Stream Object, not sure.
An XRef Stream must use the FlateDecode filter. ASCIIHexDecode won't work. A predictor is not required.
Here is a link to the above PDF, fixed up for Acrobat.

PostScript PDF (1.7), manually writing code

I'm trying to manually write a simple PDF file that contains a title, some text, and an image. I found one example of a manually written "Hello world" and managed to change some things, but I cant get it working for another text object. I have looked for help on the internet but with no luck, I guess not many people write their own PDF files.
This is what I have so far:
%PDF-1.7
1 0 obj % entry point
<<
/Type /Catalog
/Pages 2 0 R
>>
endobj
2 0 obj
<<
/Type /Pages
/MediaBox [ 0 0 200 200 ]
/Count 1
/Kids [ 3 0 R ]
>>
endobj
3 0 obj
<<
/Type /Page
/Parent 2 0 R
/Resources <<
/Font <<
/F1 4 0 R
>>
>>
/Contents 4 0 R
>>
endobj
4 0 obj % page content
<<
/Length 20
>>
stream
BT
80 180 TD
/F1 14 Tf
(PDF) Tj
ET
endstream
endobj
5 0 obj % page content
<<
/Length 20
>>
stream
BT
50 70 TD
/F1 14 Tf
(this is a pdf) Tj
ET
endstream
endobj
trailer
<<
/Size 6
/Root 1 0 R
>>
startxref
492
%%EOF
I have tried adding another text object with "this is a pdf" text but it wont show up, I don't know what could be wrong, I tried changing a few things but with no luck. The image part I don't have it either, so some help with that would be nice.
This is a wiki about the "hello world" pdf I found:
http://www.gnupdf.org/Introduction_to_PDF
Adobe offers some explanation on how the pdf works but I cant find anything that would fix my problem:
http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf

This is not a valid PDF. If Acrobat opens it at all it's because it's given up on the xref table and done a full scan of the file, but your PDF is invalid. 4 0 obj is not a font, as you specified, and 5 0 obj is not accessed from anywhere.
PDF specification requires an xref table which points to the exact position in the file for each object. You can't realistically write this by hand unless you intend to manually update the entire xref table every time you add or remove even 1 byte from the file.
You can write a PDF from scratch like this from code easily enough but it will not work to just open a PDF in notepad and start changing things because the index (xref) immediately becomes corrupt.
I'd also advise against putting comments throughout the file unless the comments start on new lines. Otherwise some PDF parsers will get confused as this is generally not expected. Usually PDF files do not contain comments (with the exception of the second line, which is recommended by Adobe to be a comment of some non-ASCII characters so FTP recognizes the file as binary) seeing as they are virtually impossible to write manually anyway.
http://www.adobe.com/devnet/pdf/pdf_reference.html

A few years ago, I wrote a book which covers exactly this sort of thing:
http://www.amazon.com/PDF-Explained-John-Whitington/dp/1449310028/
No free online version, I'm afraid. You can get all the same information from Adobe's own documentation, which is free, but it's a rather long document!

PDF Flag annotations

I try to (programmatically) write the page numbers to all pages in a PDF file.
The object I use to write looks like this:
493 0 obj
<</Length 96>>
stream
Q
/2 12 Tf
/DeviceRGB cs
0 0 0 scn
q
1 0 -0 1 298 32 cm
BT
1 0 0 1 -3.6 1.884 Tm
(2) Tj
ET
Q
endstream
endobj
It worked fine, until I tried to do it on a page which uses the flag "/rotate" :
23 0 obj
<</Parent 2 0 R /Rotate 180 /Contents [492 0 R 24 0 R 493 0 R ] ... >>
...
When tried to do so, the number I wrote came upside down (and in the top of the page instead of bottom).
I read about this in the PDF manual, and found I can use the annotation flags, indicating I want the written number to be fixed, and not effected by page rotation.
For that, I tried to add to the 493 obj dictionary the corresponding flag (NoRotate):
493 0 obj
<</Length 96 /F 16>>
stream
...
The only thing that actually happens is that the number I try to write doesn't show at all.
I tried to load different numbers into the "/F", but they all lead to an invisible number.
I tried to look for examples in the manual and over the net, but didn't find.
What am I doing wrong?
Maybe I place the "/F" in the wrong location??

According to Adobe's PDF Reference v1.7 (link to PDF), 8.4.2 Annotation Flags, the flag /F only applies to annotations -- objects with a /Type of /Annot, and appearing in a PDF as sticky notes, text edits, and clickable rectangles.
It seems you have to provide the rotation yourself, using the Tm operator.

Set PDF to print with no scaling

I am generating a PDF (using fpdf) and I am wondering if there is a way to set the document's properties to to default to print with no scaling.
So when you select print from the print dialogue menu, scaling is set to none. I'm trying to determine if this is a user setting or something I can control in the creation of the PDF.
Thanks in advance.

I've done it adding to the method _putcatalog() the following:
$this->_out('/ViewerPreferences [/PrintScaling/None]');
After the line:
$this->_out('/Type /Catalog');
Implementing a method is just fast and easy...

Print-scaling can be turned off for invividual PDF files using Adobe Acrobat, by going to File -> Preferences -> Advanced -> Page scaling. (You can try this using the trial version of Acrobat.)
As for achieving this in code, I've tried and failed to make it work, but the critical difference in the files seems to be:
10 0 obj
<</Metadata 2 0 R/Outlines 6 0 R/Pages 7 0 R/Type/Catalog/ViewerPreferences<</PrintScaling/None>>>>
endobj
for non-scaling PDFs, compared to
10 0 obj
<</Metadata 2 0 R/Outlines 6 0 R/Pages 7 0 R/Type/Catalog>>
endobj
for those that use the default shrink-to-fit option.

For me changing the FPDF Catalog method _putcatalog() and adding
$this->_out('/ViewerPreferences [/PrintScaling/None]');
wasn't accomplishing the goal so I looked at the code produced by a Acrobate XI PDF and found some more verbage. Adding the following code
$this->_out('/ViewerPreferences<</Duplex/Simplex/Enforce[/PrintScaling]/PrintScaling/None>>');
created a PDF that no longer defaulted to scaling and instead only gave the option to print Actual Size which was what was desired.

Scaling is controlled by the PDF application - it is not set in the file.

well i'm not sure if you mean somethink like this:
http://www.fpdf.org/en/doc/setdisplaymode.htm
or no "scaling" for images?
$im2 = pdf_open_image_file($dokument, 'jpeg', 'example.jpg');
pdf_place_image($dokument, $im2, 395, 655, 1.0); /* 1.0 = qualiti/scaling - 1.0 is original .../*
pdf_close_image($dokument, $im2);

I ran into the same problem.
I have several PDFs where the content of the PDF, that is text and images, go very near the PDFs border but still the print dialogue Preview/Acrobat suggests printing it in 100% scaling, thus cutting off the contents which aren't printable because of the printers natural margins.
Creating any PDF in Pages for example results in a PDF which is printed in 100% scaling by default.
However if I create a PDF using TCPDF which is related to FPDF than the printer dialog suggests to scale it in order to fit the page.
My suspicion is that the way the PDF is created is different. I suspect that Pages and other tools create separate layers and they are then handeled differently, possibly by a flag or something.
I compared the readable parts of my two PDF-Files and did come accross some differences, especially on how the documents begin. My knowledge of the PDF-Sources is, however very limited, so I can only guess what needs to change.
Is there a PDF-Reference where it is stated how to control the printable objects/areas?
Here the content of a minimal PDF which will be printed without scaling:
%PDF-1.4
1 0 obj
<< /Type /Catalog
/Outlines 2 0 R
/Pages 3 0 R
>>
endobj
2 0 obj
<< /Type /Outlines
/Count 0
>>
endobj
3 0 obj
<< /Type /Pages
/Kids [4 0 R]
/Count 1
>>
endobj
4 0 obj
<< /Type /Page
/Parent 3 0 R
/MediaBox [0 0 595 842]
/Contents 5 0 R
/Resources << /ProcSet 6 0 R
/Font << /F1 7 0 R >>
>>
>>
endobj
5 0 obj
<< /Length 73 >>
stream
BT
/F1 24 Tf
100 100 Td
(Hello World) Tj
ET
endstream
endobj
6 0 obj
[ /PDF /Text ]
endobj
7 0 obj
<< /Type /Font
/Subtype /Type1
/Name /F1
/BaseFont /Helvetica
/Encoding /MacRomanEncoding
>>
endobj
xref
0 8
0000000000 65535 f
0000000009 00000 n
0000000074 00000 n
0000000120 00000 n
0000000179 00000 n
0000000364 00000 n
0000000466 00000 n
0000000496 00000 n
trailer
<< /Size 8
/Root 1 0 R
>>
startxref
625
%%EOF

Ok, I think I got it.
Try this: open your TCPDF-created PDF and remove all occurenecs of viewerpreferences and any box-statements other than the MediaBox... doing so finally resulted in a print-default-scaling-free PDF :)
seams like those additional infos -intended for professional printing- only confuse the common pdf-viewer instead of helping with anything :)
Goto tcpdf.php and change line 8529 in method _putpages as follows
change
$boxes = array('MediaBox', 'CropBox', 'BleedBox', 'TrimBox', 'ArtBox');
into
$boxes = array('MediaBox');
In my PDF-output this instantly removed the scaling problem :)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Explanation of part of a PDF file - pdf

I1 and I2 are either images or form XObjects. Probably for some reason Poppler cannot decode their content and crashes. Even if I see the file I do not know the internals of Poppler so it is difficult to guess what it is causing the problem, unless it is an obvious error in the PDF structure.

Related

How to test ClamAV service for potential threats

PDF that renders in Chrome but not in Acrobat

PostScript PDF (1.7), manually writing code

PDF Flag annotations

Set PDF to print with no scaling

Categories

Resources