Trying to understand data in cross-reference (XRef) stream in PDF - pdf

I'm trying to read a PDF file that is linearized and uses cross-reference streams. I believe that I mostly understand what's happening except for the last two entries in the table. Those two, for objects 5 and 6, appear to be in use but show file offsets that vastly exceed the file size. Also, the PDF file I have doesn't even have objects number 5 or 6 in it.Here is the cross-reference stream:
4 0 obj
<</DecodeParms<</Columns 4/Predictor 12>>/Filter/FlateDecode/ID[<ED772C59D33BA74FA1DEE567740067A0><ED772C59D33BA74FA1DEE567740067A0>]/Info 6 0 R/Length 39/Root 8 0 R/Size 7/Type/XRef/W[1 3 0]>>stream
hfibb&F…ˆl&fit ¡ÿ"∏ôügÕ≤=‘
endstream
And here are the raw data after FlateDecode, arranged in rows. FlateDecode reports that 35 bytes of data were inflated.
02 00 00 00 00
02 01 19 87 6b
02 00 00 0d 67
02 00 00 01 8c
02 00 00 01 0b
02 01 e7 6a 99
02 00 00 00 01
I also applied a PNG Predictor function (up) which yielded 7 rows of 4 bytes each:
00 00 00 00
01 19 87 6b
01 19 94 d2
00 00 0e f3
00 00 02 97
01 e7 6b a4
01 e7 6a 9a
Row 0 is all zero, check. The offsets for object 1 and 2 do in fact address object 1 and 2 in the PDF file. So far, so good. Object 3 is marked unused, and for sure there is no object 3 in the PDF file.
But then, I'm a little confused that object 4, this cross-reference stream, is marked as unused. Still, since it is object 4 that I am parsing, I've clearly had no difficulty finding it.But where I am completely confused are the rows for object 5 and 6. The "01" in the first column tells me that they are in use. But their offsets exceed the size of the entire file, and in any case, there are no object 5 nor 6 in the file. The Size entry in the dictionary clearly has a value of 7, telling me the table should contain data for objects 0 thru 6. After filtering, I have 28 bytes of data, which makes sense for seven rows of four bytes each.Why are entries for 5 and 6 there at all? And, given that they are there, why are they marked as "in use" with apparently nonsense offsets?The file seems valid. Both Adobe Illustrator and Acrobat Reader open it without complaint. I haven't found anything in the PDF spec about special treatment for the last two rows of an Xref stream. What am I missing?

You interpret the predictor to add the current input row and the previous input row to retrieve the current data row. Shouldn't you add the current input row and the previous data row? That would change results for object 3 onward:
02 00 00 00 00 00 00 00 00
02 01 19 87 6b 01 19 87 6b
02 00 00 0d 67 01 19 94 d2
02 00 00 01 8c 01 19 95 5e
02 00 00 01 0b 01 19 96 69
02 01 e7 6a 99 02 00 00 02
02 00 00 00 01 02 00 00 03
Now objects 3 and 4 have proper offsets matching the data from your pastebin paste and objects 5 and 6 would be marked as objects in object streams.

Related

How to identify an IDR frame from the Video packet

I need to be able to inspect ts packets (looking at video PID 481) and determine whether the packet contains an IDR frame. My understanding is that I need to look for a NAL unit start code, and then something else after that to signify it's the start of an IDR frame. Please can someone clarify?
Here's an example of a packet that I think is an IDR frame, but need to be able to prove it from the payload data:
* Packet 2
---- TS Header ----
PID: 481 (0x01E1), header size: 12, sync: 0x47
Error: 0, unit start: 1, priority: 0
Scrambling: 0, continuity counter: 1
Adaptation field: yes (8 bytes), payload: yes (176 bytes)
Discontinuity: 1, random access: 1, ES priority: 0
PCR: 0x000000013A5
---- PES Header ----
Stream id: 0xE0 (Video 0)
PES packet length: 0 (unbounded)
---- Full TS Packet Content ----
47 41 E1 31 07 D0 00 00 00 08 7E E5 00 00 01 E0 00 00 84 C0 0A 31 00 05
E5 CD 11 00 05 AD 8D 00 00 00 01 09 10 00 00 00 01 67 64 00 29 AC D9 40
78 04 4F DE 02 94 04 04 05 00 00 03 00 01 00 00 03 00 32 E6 80 00 F4 24
00 04 F5 8A 49 30 0F 8B 16 CB 00 00 00 01 68 FA A7 CB 00 00 01 06 00 05
95 6C 60 E4 85 80 00 00 01 06 05 FF FF F5 DC 45 E9 BD E6 D9 48 B7 96 2C
D8 20 D9 23 EE EF 78 32 36 34 20 2D 20 63 6F 72 65 20 31 34 38 20 2D 20
48 2E 32 36 34 2F 4D 50 45 47 2D 34 20 41 56 43 20 63 6F 64 65 63 20 2D
20 43 6F 70 79 72 69 67 68 74 20 32 30 30 33 2D 32 30 31 36
Its not possible to tell from that packet. It is however VERY likely it is an IDR. I say its likely, because looking at the NALUs, I can see an AUD 00 00 00 01 09, an SPS 00 00 00 01 67 a PPS 00 00 00 01 68 then an SEI 00 00 01 06
The SEI however takes the remaining bytes of the packet, You will need to continue reading packets from that PID until you fine the next NALU and see if its an IDR.

How to Parse ISO 8583 message

How can I determine where the MTI start in an ISO 8583 message?
00 1F 60 00 05 80 53 08 00 20 20 01 00 00 80 00 00 92 00 00 00 31 07 00 05 31 32 33 34 31 32 33 34
In that message 00 1F is the length, and 60 00 05 80 53 is the TPDU. (Those aren't part of ISO8583). 08 00 is the MTI. The next 8 bytes are the primary bitmap.
You can buy a copy of the ISO8583 specification from ISO. There is an introduction on wikipedia
position of the MTI is network specific, and should be explained in their technical specifications document.
You can eyeball the MTI by looking for values such as 0100, 0110, 0220, 0230, 0800, etc. in the first 20 bytes, and the are typically followed by 8 to 16 bytes of BMP data
your data shows MTI = 800 with a bitmap = 20 20 01 00 00 80 00 00
That means the following fields are present, 3,11,24,41, with DE 3 (PRoc code) = 920000, DE 11 (STAN) = 003107, and the remaining is shared among 24 and 41, I am not sure about their sizes
In this message a 2 byte header length is used:
00 1F
But some Hosts also use 4 byte header length for ISO 8583 messages. So you can not generalize it, it depends on what you have arranged with the sending Host.

USB SCSI Inquiry and RequestSense

I am currently writing USB(EHCI) driver for pendrives in assembly for my OS.
I can successfully get the Device-, String-, Configuration(Interface and Endpoint)-descriptors.
However, the SCSI Inquiry and other commnads fail.
This is what I do (after getMaxLUN):
SetConfiguration(1)
Inquiry
Another way:
SetConfiguration(1)
TestUnitReady
RequestSense
TestUnitReady
I correctly set the EndPt in the QueueHead (BulkIn or BulkOut) and the device-address.
I am implementing the USB-driver by the book "B. D. Lunt - USB: The Universal Serial Bus".
If I send Inquiry, this is what I get (return buffer, 36 bytes, hex):
01 00 00 00 01 00 00 00 80 0D 24 00
40 15 16 00 00 20 16 00 00 30 16 00
00 40 16 00 00 50 16 00 00 00 00 00
the 11th byte is 0x24, the length of the returned data (36).
This looks garbage or an error message.
If I send TestUnitReady, then RequestSense returns (18 bytes, hex):
01 00 00 00 01 00 00 00 80 0D 12 00
C0 16 16 00 00 20
the 11th byte is 0x12, the length of the returned data (18).
The contents of the two returned buffers are similar. The CBW of Inquiry and TestUnitReady look good (I use LUN=0). The SCSI-spec-4 says something about an incorrect logical-unit, if the result is not the expected one, but I have only LUN 0.
I am testing with a Sony 4GB pendrive.
Inquiry should work immediately (return a correct data in the buffer) regardless of the CSW after it and I don't understand what the problem can be.
I am experimenting with BulkReset and Clear_Feature(for BulkIn and BulkOut) but that doesn't seem to fix the Inquiry.
The CSW of the TestUnitReady returns Status=01 but after a BulkReset and ClearFeature it returns Status=0 (success).
What can be the reason for the incorrect data in the buffer(s) ?
Any help is appreciated.
EDIT: I tried it with delays too (100ms before sending CBWs, getting CSWs).
It didn't help.
EDIT2: calling GetConfiguration() right after SetConfiguration(1), correctly returns 1.
EDIT3: This must be due to an incorrect pointer. I consider it solved.

reading a pdf version >= 1.5, how to handle Cross Reference Stream Dictionary

I'm trying to read the xref table of a pdf version >= 1.5.
the xref table is an object:
58 0 obj
<</DecodeParms<</Columns 4/Predictor 12>>/Filter/FlateDecode/ID[<CB05990F613E2FCB6120F059A2BCA25B><E2ED9D17A60FB145B03010B70517FC30>]/Index[38 39]/Info 37 0 R/Length 96/Prev 67529/Root 39 0 R/Size 77/Type/XRef/W[1 2 1]>>stream
hÞbbd``b`:$AD`­Ì ‰Õ Vˆ8âXAÄ×HÈ$€t¨ – ÁwHp·‚ŒZ$ìÄb!&F†­ .#5‰ÿŒ>(more here but can't paste)
endstream
endobj
as you can see
/FlatDecode
/Index [38 39], that is 39 entries in the stream
/W [1 2 1] that is each entry is 1 + 2 + 1 = 4 bytes long
/Root 39 0 R that is root object is number 39
BUT :
the decompressed stream is 195 bytes long (39 * 5 = 195). So the length of an entry is 4 or 5.
Here is the first inflated bytes
02 01 00 10 00 02 00 02 cd 00 02 00 01 51 00 02 00 01 70 00 02 00 05 7a 00 02
^^
if entry length is 4 then the root entry is a free object (see the ^^) !!
if the entry is 5: how to interpret the fields of one entry (reference is implicitly made to PDF Reference, chapter 3.4.7 table 3.16 ) ?
For object 38, the first of the stream: it seems, as it is of type 2, to be the 16 object of the stream object number 256, but there is no object 256 in my pdf file !!!
The question is: how shall I handle the 195 bytes ?
A compressed xref table may have been compressed with one of the PNG filters. If the /Predictor value is set to '10' or greater ("a Predictor value greater than or equal to 10 merely indicates that a PNG predictor is in use; the specific predictor function used is explicitly encoded in the incoming data")1, PNG row filters are supplied inside the compressed data "as usual" (i.e., in the first byte of each 'row', where the 'row' is of the width in /W).
Width [1 2 1] plus Predictor byte:
02 01 00 10 00
02 00 02 cd 00
02 00 01 51 00
02 00 01 70 00
02 00 05 7a 00
02 .. .. .. ..
After applying the row filters ('2', or 'up', for all of these rows), you get this:
01 00 10 00
01 02 ed 00
01 03 3e 00
01 04 ae 00
01 09 28 00
.. .. .. ..
Note: calculated by hand; I might have made the odd mistake here and there. Note that the PNG 'up' filter is a byte filter, and the result of the "up" filter is truncated to 8 bits for each addition.
This leads to the following Type 1 XRef references ("type 1 entries define objects that are in use but are not compressed (corresponding to n entries in a cross-reference table)."):2
#38 type 1: offset 10h, generation 0
#39 type 1: offset 2EDh, generation 0
#40 type 1: offset 33Eh, generation 0
#41 type 1: offset 4AEh, generation 0
#42 type 1: offset 928h, generation 0
1 See LZW and Flate Predictor Functions in PDF Reference 1.7, 6th Ed, Section 3.3: Filters.
2 As described in your Table 3.16 in PDF Ref 1.7.

understand hexedit of an elf

Consider the following hexedit display of an ELF file.
00000000 7F 45 4C 46 01 01 01 00 00 00 00 00 .ELF........
0000000C 00 00 00 00 02 00 03 00 01 00 00 00 ............
00000018 30 83 04 08 34 00 00 00 50 14 00 00 0...4...P...
00000024 00 00 00 00 34 00 20 00 08 00 28 00 ....4. ...(.
00000030 24 00 21 00 06 00 00 00 34 00 00 00 $.!.....4...
0000003C 34 80 04 08 34 80 04 08 00 01 00 00 4...4.......
00000048 00 01 00 00 05 00 00 00 04 00 00 00 ............
How many section headers does it have?
Is it an object file or an executable file?
How many program headers does it have?
If there are any program headers, what does the first program header do?
If there are any section headers, at what offset is the section header table?
Strange, this hexdump looks like your homework to me...
There are 36 section headers.
It is an executable.
It has 8 program headers.
As you can tell by the first word (offset 0x34: 0x0006) in the first program header, it is of type PT_PHDR, which just informs about the characteristics of the program header table itself.
The section header table begins at byte 5200 (which is 0x1450 in hex).
How do I know this stuff? By dumping the hex into a binary and reading it with readelf -a (because I am lazy). Except for question no. 4, which I had to figure out manually by reading man 5 elf.