VxD: Determine the size of file segments for code and data and their location - header

I have a sample VxD code here that just shows a message box whenever a DOS prompt is opened or closed under Windows 98.
What I need to know is which fields determine the code segment or page, the data segment, and their size in bytes or pages.
It is actually an old Linear Executable EXE that for VxDs is supposedly always has a signature of "LE".
So far I know that:
MemPagesCtr tells how many blocks there are of MemPageSZ bytes, that I can make 512 bytes in size and still be sane until now.
LastPageBytes says how many bytes the very last portion of the VxD file has, which can be less or probably equal than MemPageSZ.
Which header fields indicate which of the pages is the code, data, or are they always in the same sequence for VxD LE EXE? And how to know where the actual code entry and data start?
;http://master.dl.sourceforge.net/project/api-simple-completa/api.7z?viasf=1
%include "../../../PC/BIOS/WinAPI/EXE/PE/snippets/DOS_Stubs/NASM/DOS_Stub_classic0xB0.asm"
PE_header:
LE_header:
LE_header_start:
LE_header_0x00_Signature db 'LE'
LE_header_0x02_ByteOrder db 0 ;0 Little or Big Endian
LE_header_0x03_WordOrder db 0 ;0 Little or Big Endian
LE_header_0x04_FormatLevel dd 0
LE_header_0x08_CPUType dw 2 ;1 286+, 2 386+, 3 486+, 4 586+, 20h i860 N10+, 21h N11+, 40h MIPS Mark I+ (R2000, R3000), 41h MIPS Mark II+ (R6000), 42h MIPS Mark III+ (R4000)
LE_header_0x0A_TargetOS dw 4 ;1 OS/2, 2 Win16, 3 DOS 4x, 4 Win386
LE_header_0x0C_ModuleVersion dd 0
LE_header_0x10_Flags dd 101000000000000000b ;
LE_header_0x14_MemPagesCtr dd 3
LE_header_0x18_InitialCS dd 0
LE_header_0x1C_InitialEIP dd 0 ;Explicar cómo encuentra y carga las secciones y direcciones de código y datos el VxD
LE_header_0x20_InitialSS dd 0
LE_header_0x24_InitialESP dd 0
LE_header_0x28_MemPageSZ dd 512 ;Tamaño de datos
LE_header_0x2C_LastPageBytes dd 0x31
LE_header_0x30_FixUpSectionSZ dd 0x85
LE_header_0x34_FixUpSectionCheck dd 0
LE_header_0x38_LdrSectionSZ dd 0x69
LE_header_0x3C_LdrSectionCheck dd 0
LE_header_0x40_ObjectTableOffset dd 0xC4 ;from where?
LE_header_0x44_ObjectTableEntries dd 3
LE_header_0x48_ObjectPageMapOffset dd 0x10C
LE_header_0x4C_ObjectIterDataMapOff dd 0 ;always 0 for VxD?
LE_header_0x50_RsrcTableOffset dd 0
LE_header_0x54_RsrcTableEntries dd 0
LE_header_0x58_ResidentTableOffset dd 0x118
LE_header_0x5C_EntryTableOffset dd 0x123
LE_header_0x60_ModulDirectivesTable dd 0 ;always 0 for VxD?
LE_header_0x64_ModulDirectivesEnts dd 0
LE_header_0x68_FixupPageTblOffset dd 0x12D
LE_header_0x6C_FixupRecordTable dd 0x13D
LE_header_0x70_ImportModulesTable dd 0x1B2
LE_header_0x74_ImportModulesCount dd 0
LE_header_0x78_ImportProcTable dd 0x1B2
LE_header_0x7C_PerPageChecksums dd 0
LE_header_0x80_DataPagesOffset dd 0x400 ;0x1000; 0x400;0x2E0;0x1000 ;inicio de código y datos después de la cabecera
LE_header_0x84_PreloadPageCount dd 1
LE_header_0x88_NonResidentNames dd 0x1434
LE_header_0x8c_NonResidentNamesLen8 dd 15 ;in bytes
LE_header_0x90_NonResidentNamesChk dd 0
LE_header_0x94_AutomaticDataObject dd 0
LE_header_0x98_DebugInformationOff dd 0
LE_header_0x9C_DebugInformationLen dd 0
LE_header_0xA0_PreloadInstPagesNum dd 0
LE_header_0xA4_DemandInstancePgsNum dd 0
LE_header_0xA8_ExtraHeapAllocation dd 0
LE_header_0xAC dd 0
;File Offset 0x160, LE header offset 0xB0
;INIT:
;INIT:
;INIT:
;INIT:
times 16 db 0
dd 0x4000000
dd 0xE8
dd 0
dd 0x2045
dd 1
dd 1
db "LCOD'",0,0,0
dd 0
dd 0x2005
dd 2
dd 1
db "PCOD1",0,0,0
dd 0
dd 0x2023
dd 3
dd 1
db "PDAT"
dd 0x10000
db 0,0,2,0
db 0,0,3,0
db 7
db "MESSAGE"
db 0,0,0,1,3,1,0,3,0,0,0,0,0,0,0,0
db 0,0x60,0,0,0,0x75,0,0,0,0x75,0,0,0,7,0,0xE1
db 0,2,0x20,0,0x27,0,0x1F,1,0x5F,0,0x65,0,0x69,0,0x6D,0
db 0x71,0,0x75,0,0x79,0,0x7D,0,0x81,0,0x85,0,0x89,0,0x8D,0
db 0x91,0,0x95,0,0x99,0,0x9D,0,0xA1,0,0xA5,0,0xA9,0,0xAD,0
db 0xB1,0,0xB5,0,0xB9,0,0xBD,0,0xC1,0,0xC5,0,0xC9,0,0xCD,0
db 0xD1,0,0xD5,0,0xD9,0,0xDD,0,7,0,0x61,0,2,0,0,7
db 0,0x5B,0,1,0x61,0,7,0,0x18,0,1,0x50,0,7,0,0x21
db 0,3,0x1F,0,7,0,0x11,0,3,0,0,7,0,1,0,3
db 15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,32,32,32,32
db "MESSAGE_DDB",0,32,0X40,0X31,0
db 32,32,32,"CLASS",0,"'RCODE",0
db 0,0,32,32,32,"PRELOAD DISCARDABLE",0,0
db "CONFORMING",0,0,0,0,0,0,0,0,0,0,0,0
;times 0xD20 db 0
;align 0x1000
align 0x400
;END:
;END:
;END:
;END:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;0x1000 en el archivo tiene .00000000
;de base en HIEW, por qué?
;;
;INIT: Page 0
;INIT: Page 0
;INIT: Page 0
;INIT: Page 0
db 0,0,0,0,0,4,0,0,1,0,0,0,"MESSAGE"
db 32,0,0,0,0x80,0,0,0,0,0,0,0,0
times 16 db 0
dd 0,0,0
db "verPP",0,0,0
db "1vsR2vsR3vsR",0x83,0xE8,7,0x83,0xF8,0x21,0x73,7,-1,0x24,0x85,0,0,0,0,0xF8
db 0xC3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
times 16*7 db 0
db 0,0,0,0,0,0xCC,0xCC,0xCC,0,0,0,0,0,0,0,0
;times 16*17 db 0
%define round(n, r) (((n+(r-1))/r)*r)
align 512
;END: Page 0
;END: Page 0
;END: Page 0
;END: Page 0
;align 512
;_s:
;times (512-_s) db 0
;dd round(_s,512)
;times (512-$$) db 0
;INIT: Page 1
;INIT: Page 1
;INIT: Page 1
;INIT: Page 1
bits 32
mov ecx,0 ;xor ecx,ecx
;xor ecx,ecx
db 0xCD,0x20,3,0,1,0 ;VMMcall Get_Sys_VM_Handle
mov eax,0x30
mov edi,0
xor esi,esi
xor edx,edx
;db 0xCD,0x20,4,0,17,0 ;VxDcall VMCPD.Set_CR0_State ;17 en vez de 0x17, probar valores WORD o DWORD
db 0xCD,0x20,4,0,0x17,0 ;VxDcall SHELL.Message
ret
mov ecx,0
db 0xEB,0xDE ;jmps .5
;times 9 db 0
;times 16*29 db 0
align 512
;END: Page 1
;END: Page 1
;END: Page 1
;END: Page 1
;INIT: Page 2
;INIT: Page 2
;INIT: Page 2
;INIT: Page 2
db "VxD MessageBox",0
db "A VM is created",0
db "A VM is destroyed",0
;db "SEC",0x0B
;db "MESSAGE_DDB",0,0,0
;END: Page 2
;END: Page 2
;END: Page 2
;END: Page 2

Related

Google Apps Script Not Reading editable fields on PDF

I created a code on GoogleAppScript with ORC to get text from a PDF file on Google Drive, using the getFileById() but the problem is that this file is an Adobe PDF Forms type and the code reads only the texts that are not in the fields that were edited in the form. Does anyone have any suggestions on how I can get this?
The file that I used as an example: http://foersom.com/net/HowTo/data/OoPdfFormExample.pdf (Please, fill the file and save on your drive to test it)
The PDF on my Drive:
PDF file image
This is the result:
values ​​shown when code is executed
Here is my code:
function extractTextFromPDF() {
var fileId = '[File ID here]';
const ss = SpreadsheetApp.getActiveSpreadsheet()
//Get all PDF files:
const fileID = DriveApp.getFileById(fileId);
var blob = DriveApp.getFileById(fileId).getBlob()
var resource = {
title: blob.getName(),
mimeType: blob.getContentType(),
};
// Enable the Advanced Drive API Service
var file = Drive.Files.insert( resource, blob, { ocr: true, ocrLanguage: 'en' } );
//,supportsAllDrives: true
// Extract Text from PDF file
var doc = DocumentApp.openById(file.id);
var text = doc.getBody().getText();
Logger.log(text)
return text;
}
The FDF data does not need to be stored in the order you see on the page, often may be the order the fields were added. The easiest way to see the text /V(alues) in reply to /T(ext field) entries is via the FDF a user can save and send without the Source PDF. This has the added advantage that a CHECK box will be sent as /V /Yes or /V /OFF (= NOT checked) which is notoriously difficult with a binary PDF, however you need to know which text on the page was Language 1 since the author did not tag it as Deutsch!
What intrigues me most (without looking closer) is the count of attached objects only went up by 12 but there are more than that number of potentially added answers,
from a field :-) of 17.
/Lang(en-GB)
/AcroForm<</Fields[
5 0 R 7 0 R 8 0 R 9 0 R 10 0 R
11 0 R 12 0 R 13 0 R 14 0 R 16 0 R
17 0 R 18 0 R 19 0 R 20 0 R 21 0 R
22 0 R 23 0 R
]/DR 37 0 R/NeedAppearances true>>
Anyway if you dont want to do it the easy way you will need to program something to read those object chains. Personally it takes each user a few seconds to export the forms or have them sent in the background by email however most modern users dont use historic mailto: so you have to get them to web mail the form for you to press the extract button.
So here is object 5 and we can see it is for your Given Name
5 0 obj
<</Type/Annot/Subtype/Widget/F 4
/Rect[165.7 453.7 315.7 467.9]
/FT/Tx
/P 1 0 R
/T(Given Name Text Box)
/TU<FEFF004600690072007300740020006E0061006D0065>
/V <FEFF>
/DV <FEFF>
/MaxLen 40
/DR<</Font 6 0 R>>
/DA(0 0 0 rg /F3 11 Tf)
/AP<<
/N 38 0 R
>>
>>
endobj
so the entry for 004600690072007300740020006E0061006D0065 converts to First name and we can see the answer when the user sends back the form as a PDF
5 0 obj
<</AP<</N 38 0 R >>/DA(0 0 0 rg /F3 11 Tf)/DR<</Font 6 0 R >>/DV<FEFF>/F 4/FT/Tx/MaxLen 40/P 1 0 R /Rect[ 165.7 453.7 315.7 467.9]/Subtype/Widget/T(Given Name Text Box)
/TU<FEFF004600690072007300740020006E0061006D0065>/Type/Annot/V(Brunno)>>
endobj

FASM - Boot sector on USB don't work

in first, sorry for my bad english, i'm french.
At the moment, i learn asm with fasm to test boot sector programming.
I have make a simple boot program, i have compiled it and i write boot.bin in first sector of my usb.
But when i boot on my PC or in virtualbox, drive isn't found....
Boot sector code:
;=======================================================================
; a simpliest 1.44 bootable image by shoorick ;)
;=======================================================================
_bs equ 512
_st equ 18
_hd equ 2
_tr equ 80
;=======================================================================
org 7C00h
jmp start
nop
;=====================================================
db "HE-HE OS"; ; 8
dw _bs ; b/s
db 1 ; s/c
dw 1 ; rs
db 2 ; fats
dw 224 ; rde
dw 2880 ; as
db 0F0h ; media
dw 9 ; s/fat
dw _st ; s/t
dw _hd ; h
dd 0 ; hs
dd 0 ; --
db 0 ; drv
db 0 ; --
db 29h ; ebr
dd 0 ; sn
db "NO NAME "; ; 11
db "FAT12 "; ; 8
;=====================================================
start:
mov ax,cs
mov ds,ax
mov cx,count
mov si,hello
mov bx,7
mov ah,0Eh
##:
lodsb
int 10h
loop #B
xor ah,ah
int 16h
int 19h
hello db "Hi! This is disk-invalid!"
count = $ - hello
;=======================================================================
rb 7E00h-2-$
db 055h,0AAh
;=======================================================================
This code is provide by examples of fasm's website.
there are couple of reasons why a bootloader wont work:
the bootloader is not in the first sector of the USB/Floppy/etc.
the bootloader is not EXACTLY 512 bytes long
you are missing the 0xAA55 signature at the last 2 bytes of the bootloader
in your example i assume you have the wrong bootloader size ( it is not 512 bytes )
try replacing
rb 7E00h-2-$
db 055h,0AAh
with
TIMES 510-($-$$) DB 0
DW 0xAA55
this ensures that your file is exactly 512 bytes long and that is has the required bootloader signature

Detect position of watermark in a pdf

I am on ubuntu.
I have a pdf file with pages divided into a grid. Each block of the grid contains name/age/dob/photo of a candidate. some records have a watermark "disqualified"
I need to scrape his pdf, with disqualified candidates in a separate list.
Using pyPdf I was able to get individual records, but it also includes watermarked candidates.
How to detect the watermark? If I can get the coordinates of the watermark, how do I match it with the candidate?
I am open to solutions other than python pyPdf
(Actually this is not an answer but merely an analysis to bit for a comment.)
I don't know pyPdf (or any python PDF classes) myself, but here is how the watermark is created for a sample entry; based upon this, anyone knowing pyPDF well enough, may more easily advice.
The Roundup
Depending on how pyPDF (or other python PDF classes) allows access to the page content, there are two major basic approaches:
If the class returns information on content (text and image) in their order in the page content stream: The watermark image xobject is referred to right before the data of the entry. Thus, any entry preceded by the drawing of a xobject image is marked.
If otherwise the information are not given in the order indicated by the page content stream, coordinate comparison must be used which per se is quite straight forward. In that case it might be of interest that the images are inserted with a [0.1 0 0 0.1 0 0] transformation matrix in action while the text is drawn with an identity transformation matrix.
The Details
This is entry # 200; the other watermarked entry is constructed similarly:
Watermarking is done by means of an image xobject. There is but one image xobject defined for the page used by both watermarked entries:
4 0 obj
<</Type/Page/MediaBox [0 0 595 841]
/Rotate 0/Parent 3 0 R
/Resources<</ProcSet[/PDF /ImageC /ImageI /Text]
/ColorSpace 18 0 R
/ExtGState 19 0 R
/XObject 20 0 R
/Font 21 0 R
>>
/Contents 5 0 R
>>
endobj
20 0 obj
<</R17
17 0 R>>
endobj
17 0 obj
<</Subtype/Image
/ColorSpace 16 0 R
/Width 128
/Height 88
/BitsPerComponent 8
/Filter/FlateDecode/Length 463>>stream
[...]
endstream
endobj
In the content stream this xobject /R17 is inserted right before the data of the entry itself is drawn:
q 0.1 0 0 0.1 0 0 cm
[...]
q 1045 0 0 495 462.5 6510.5 cm
/R17 Do
Q
q
10 0 0 10 0 0 cm BT
0.000487366 Tc
/R10 8 Tf
1 0 0 1 86 650.75 Tm
(Sex : Male)Tj
0.000304794 Tc
-64 0 Td
(Age : 43)Tj
-0.000140686 Tc
-1 11.05 Td
(House No :)Tj
-0.00002085 Tc
1 31.95 Td
(Name :)Tj
0.00008575 Tc
/R12 7.15 Tf
25.5 17.8 Td
( 200 )Tj
ET
Q
1547.5 6475 485 535.5 re
S
q
10 0 0 10 0 0 cm BT
-0.000403137 Tc
/R14 8 Tf
1 0 0 1 145.1 708.5 Tm
(XVX0001081)Tj
0.000421651 Tc
/R14 7.05 Tf
-90.35 -14.95 Td
(Ramesh Kumar)Tj
0.000373332 Tc
/R10 7.05 Tf
-33 -12.75 Td
(Father's )Tj
0.000193787 Tc
7.3 TL
(Name)'
0.00037774 Tc
/R14 7.05 Tf
40.25 1.8 Td
(Ram Singh)Tj
0 Tc
2.5 -11.85 Td
(37)Tj
0.00137196 Tc
/R12 7.15 Tf
-5.25 13.35 Td
(:)Tj

How to add text object to existing pdf

I have a source pdf which I am modifying by adding text objects. I am using "Incremental Updates" which is mentioned in the PDF specification. But while adding text objects using this method I am making some mistakes due to which the pdf doesn't render properly in Adobe Reader 11. When the pdf is opened and I double-click on it, the added text objects get deleted. I figured out that this is due to text annotation.
Now I want to know how a new text object can be added using incremental update? How do the Contents and RC of a free text annotation have to be to maintained?
Also is it possible to disable or delete the annotation so that my problem can be avoided easily? Because I want a simple pdf, I don't want annotation options.
The source pdf I am using is here.
The modified pdf after adding text object is here.
I am not sure that source pdf is itself proper according to pdf specification.
First off let me show you how easy things are if you can use a decent PDF library. I use iTextSharp as an example but the same can also be done with others like PDFBox or PDFNet (already mentioned by #Ika in his answer):
PdfReader reader = new PdfReader(sourcePdf);
using (PdfStamper stamper = new PdfStamper(reader, targetPdfStream)) {
Font FONT = new Font(Font.FontFamily.HELVETICA, 12, Font.BOLD, new GrayColor(0.75f));
PdfContentByte canvas = stamper.GetOverContent(1);
ColumnText.ShowTextAligned(
canvas,
Element.ALIGN_LEFT,
new Phrase("Hello people!", FONT),
36, 540, 0
);
}
(Derived from the Webified iTextSharp Example StampText.cs explained in chapter 6 of iText in Action — 2nd Edition.)
(Which PDF library you choose, depends on your general requirements and available license models.)
If, in spite of the ease of use of such PDF libraries, you insist on doing it manually, here some remarks:
First you have to find the Page dictionary of the page you want to add content to. Depending on the type of PDF this already might require decompression of object streams etc. but in your sample modified1.pdf that is not necessary:
7 0 obj
<</Rotate 90
/Type /Page
/TrimBox [ 9.54 6.12 585.68 835.88 ]
/Resources 8 0 R
/CropBox [ 0 0 595.22 842 ]
/ArtBox [ 9.54 18.36 585.68 842 ]
/Contents [ 9 0 R 10 0 R 11 0 R 12 0 R 13 0 R 14 0 R 15 0 R 16 0 R ]
/Parent 6 0 R
/MediaBox [ 0 0 595.22 842 ]
/Annots 17 0 R
/BleedBox [ 9.54 6.12 585.68 835.88 ]
>>
endobj
You see the array of references to content streams. This is where you have to add new page content to. You can manipulate an existing stream or create a new stream and add it to that array.
(Most PDFs have their content stream compressed. For the general case, therefore, you'd have to decompress a stream before you can work on it. Thus, in my eyes, the easier way would be to start a new stream.)
You chose to manipulate the last referenced stream 16 0 which in your PDF is uncompressed:
16 0 obj
<</Length 37 0 R>>
stream
S 1 0 0 1 13.183 0 cm 0 0 m
[...]
0 10 -10 -0 506.238 342.629 Tm
.13333 .11765 .12157 scn
-.0002 Tc
.0006 Tw
(the Bank and branch on which cheque is drawn\).)Tj
/F1 2 Tf
-15.1279 10.9462 Td
(abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~!##$%^&*aaaaaaaaaaaaa)Tj
/F2 1 Tf
015.1279 01.9462 Td
(ANAabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789)Tj
ET
endstream
endobj
Your additions, I gather, are the two 3-liners at the bottom which first select a font, then position the insertion point and finally print a selection of letters.
Now you say you added text abc..z and ABC...Z just for testing. But letters b j k q v etc not appearing in the pdf. The problem becomes even more visible for your second addition of letters; here only the capital 'A' and 'N' are displayed.
This is due to the fact that the fonts in question are embedded into the PDF --- fonts are embedded into PDFs to allow PDF viewers on systems which don't have the font in question, to display the PDF --- but they are not completely embedded, only the subset of characters required from that font.
Let's look for the font F2 for which only 'N' and 'A' appear:
According to the page object, the page resources can be found in object 8 0:
8 0 obj
<</Font <</F1 45 0 R /TT2 46 0 R /F2 47 0 R>>
/ExtGState <</GS2 48 0 R>>
/ProcSet [ /PDF /Text ]
/ColorSpace <</Cs6 49 0 R>>
>>
endobj
So F2 is defined in 47 0:
47 0 obj
<</Subtype /Type1
/Type /Font
/Widths [ 722 250 250 250 250 250 250 250 250 250 250 250 250 722 ]
/Encoding 52 0 R
/FirstChar 65
/FontDescriptor 53 0 R
/ToUnicode 54 0 R
/BaseFont /ILBPOB+TimesNewRomanPSMT-Bold
/LastChar 78
>>
endobj
In the referenced ToUnicode map 54 0 you see
54 0 obj
<</Length 55 0 R>>stream
/CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo <<
/Registry (AAAAAA+F2+0) /Ordering (T1UV) /Supplement 0 >> def
/CMapName /AAAAAA+F2+0 def
/CMapType 2 def
1 begincodespacerange <41> <4e> endcodespacerange
2 beginbfchar
<41> <0041>
<4e> <004E>
endbfchar
endcmap CMapName currentdict /CMap defineresource pop end end
endstream
endobj
In this mapping you see that only character codes 0x41 'A' and 0x4e 'N' are mapped
In your document the font is used only to print "NA" in the amount table cells and for nothing else. Thus, only those two letters 'N' and 'A' are embedded, which results in your addition with that font only outputting these letters.
Thus, to successfully add text to the page you either have to check the font ressources associated with the page for the glyphs they provide (and restrict your additions to those glyphs) or you have to add your own font resource.
As the presence of characters in the encoding often is not as easy to see as it is here (ToUnicode is optional), I would propose, you add your own font ressources. The PDF specification ISO 32000-1 explains how to do that.
Furthermore you state the x and y axis position for the text is not properly displaying in pdf. While you don't say what exactly you mean, you should be aware that in the content stream you can apply affine transformations to the coordinate system of the page, i.e. stretch, skew, rotate, and move the axis.
If you want to use the original coordinate system and not depend on the coordinates to be proper at your additions, you should add an initial content stream to the page containing a q operator (to save the current graphics state on the graphics state stack) and start your additions in a new final content stream with a Q operator (to restore the graphics state by removing the most recently saved state from the stack and making it the current state).
EDIT As a sample I applied the Java equivalent of the C# code at the top to your modified1.pdf with append mode activated. The following objects were changed or added as a result:
The page object 7 0 has been updated:
7 0 obj
<</CropBox[0 0 595.22 842]
/Parent 6 0 R
/Contents[69 0 R 9 0 R 10 0 R 11 0 R 12 0 R 13 0 R 14 0 R 15 0 R 16 0 R 70 0 R]
/Type/Page
/Resources<<
/ExtGState<</GS2 48 0 R>>
/ProcSet [/PDF /Text /ImageB /ImageC /ImageI]
/ColorSpace<</Cs6 49 0 R>>
/Font<</F1 45 0 R/F2 47 0 R/TT2 46 0 R/Xi0 68 0 R>>
>>
/MediaBox[0 0 595.22 842]
/TrimBox[9.54 6.12 585.68 835.88]
/BleedBox[9.54 6.12 585.68 835.88]
/Annots 17 0 R
/ArtBox[9.54 18.36 585.68 842]
/Rotate 90
>>
endobj
If you compare with your former version, you see that
two new content streams have been added, 69 0 at the start and 70 0 at the end;
the resources are not an indirect object anymore but instead are directly included here;
the resources contain a new Font ressource Xi0 at 68 0.
Now let's look at the added objects.
This is the font ressource for Helvetica-Bold named Xi0 at 68 0:
68 0 obj
<</BaseFont/Helvetica-Bold
/Type/Font
/Encoding/WinAnsiEncoding
/Subtype/Type1
>>
endobj
Non-embedded, standard 14 font resources are not complicated at all...
Now there are the additional content streams. iText does compress them, but I'll show them in an uncompressed state here:
69 0 obj
<</Length 1>>stream
q
endstream
endobj
70 0 obj
<</Length 106>>stream
Q
q
0 1 -1 0 595.22 0 cm
q
BT
1 0 0 1 36 540 Tm
/Xi0 12 Tf
0.75 g
(Hello people!)Tj
0 g
ET
Q
Q
endstream
endobj
So the new content stream at the start stores the current graphic state, and the new one at the end retrieves that stored state, changes the coordinate system, positions for text insertion, selects font, font size, and the fill colour, and finally prints a string.

PE Export Directory Table's OrdinalBase field ignored?

In my experience and that of others (http://webster.cs.ucr.edu/Page_TechDocs/pe.txt), the PE/COFF specification document incorrectly claims that the Export Address Table indices that are contained in the Ordinal Table are relative to the Ordinal Base, and even gives an incorrect example (Section 5.3). In actuality, the indices in the Ordinal Table are 0-based indices into the Address Table for the normal case in which Ordinal Base = 1. I have seen this in VS Studio generated PE libraries and in system libraries like Kernel32.dll.
My question is, have you ever observed a binary with an Ordinal Base that was not equal to 1? I want to know if this an off-by-one error, or if the Ordinal Base is never applied to Ordinal Table entries.
Here's a dump for mfc42.dll, version 6.06.8064.0.
Microsoft (R) COFF/PE Dumper Version 9.00.30729.01
Copyright (C) Microsoft Corporation. All rights reserved.
Dump of file mfc42.dll
File Type: DLL
Section contains the following exports for MFC42.dll
00000000 characteristics
4D79A4A3 time date stamp Fri Mar 11 05:27:15 2011
0.00 version
5 ordinal base
6939 number of functions
6 number of names
ordinal hint RVA name
5 0 0000ED7C ?classCCachedDataPathProperty#CCachedDataPathProperty##2UCRuntimeClass##B
6 1 0000ED44 ?classCDataPathProperty#CDataPathProperty##2UCRuntimeClass##B
7 2 000DEEAC DllCanUnloadNow
8 3 000DEE6C DllGetClassObject
9 4 000DED0A DllRegisterServer
10 5 000DEEDE DllUnregisterServer
256 0004F84F [NONAME]
[...]
6943 0003B412 [NONAME]
Here's how it looks in the binary:
;
; Export directory for MFC42.dll
;
dd 0 ; Characteristics
dd 4D79A4A3h ; TimeDateStamp: Fri Mar 11 05:27:15 2011
dw 0 ; MajorVersion
dw 0 ; MinorVersion
dd rva aMfc42_dll ; Name
dd 5 ; Base
dd 1B1Bh ; NumberOfFunctions
dd 6 ; NumberOfNames
dd rva functbl ; AddressOfFunctions
dd rva nametbl ; AddressOfNames
dd rva nameordtbl ; AddressOfNameOrdinals
;
; Export Address Table for MFC42.dll
;
functbl dd rva ?classCCachedDataPathProperty#CCachedDataPathProperty##2UCRuntimeClass##B; 0
dd rva ?classCDataPathProperty#CDataPathProperty##2UCRuntimeClass##B; 1
dd rva DllCanUnloadNow ; 2
dd rva DllGetClassObject; 3
dd rva DllRegisterServer; 4
dd rva DllUnregisterServer; 5
dd 0F5h dup(rva __ImageBase); 6
dd rva ??0_AFX_CHECKLIST_STATE##QAE#XZ; 251
[...]
;
; Export Names Table for MFC42.dll
;
nametbl dd rva a?classccachedd, rva a?classcdatapat, rva aDllcanunloadno
dd rva aDllgetclassobj, rva aDllregisterser, rva aDllunregisters
;
; Export Ordinals Table for MFC42.dll
;
nameordtbl dw 0, 1, 2, 3, 4, 5
So yes, it seems you're right and the indexes in the ordinal table are 0-based.
It's not an off-by-one error and the Ordinal Base is not applied to the Ordinal Table entries but to the calulation of the ordinal itself. And yes, the Microsoft PE specification (http://msdn.microsoft.com/en-us/library/windows/hardware/gg463119.aspx, section 5.3.4) is wrong. This is how the calculations should be done:
i = Search_ExportNamePointerTable(ExportName);
ordinal = ExportOrdinalTable[i] + OrdinalBase; // The "+ OrdinalBase" is missing in the official PE specification
SymbolRVA = ExportAddressTable[ordinal - OrdinalBase];
Or, expressed in a different way:
i = Search_ExportNamePointerTable(ExportName);
offset = ExportOrdinalTable[i];
SymbolRVA = ExportAddressTable[offset];
ordinal = OrdinalBase + offset;
If I dump my mfc42.dll...
dumpbin mfc42.dll /exports |more
...this is what I get:
Microsoft (R) COFF/PE Dumper Version 12.00.20827.3
Copyright (C) Microsoft Corporation. All rights reserved.
Dump of file mfc42.dll
File Type: DLL
Section contains the following exports for MFC42.dll
00000000 characteristics
4D798B26 time date stamp Fri Mar 11 03:38:30 2011
0.00 version
5 ordinal base
6888 number of functions
8 number of names
ordinal hint RVA name
1452 0 000EF5D8 ?AfxFreeLibrary##YAHPEAUHINSTANCE__###Z
1494 1 000EF5A4 ?AfxLoadLibrary##YAPEAUHINSTANCE__##PEBD#Z
1497 2 000F8344 ?AfxLockGlobals##YAXH#Z
1587 3 000F83DC ?AfxUnlockGlobals##YAXH#Z
7 4 000FC83C DllCanUnloadNow
8 5 000FC7E0 DllGetClassObject
9 6 000FC870 DllRegisterServer
10 7 000FC87C DllUnregisterServer
5 0001C910 [NONAME]
6 0001C8E8 [NONAME]
256 0005DEC0 [NONAME]
257 000423C0 [NONAME]
258 00042400 [NONAME]
259 00042440 [NONAME]
[...]
The 7th function (for example) above is DllRegisterServer, which corresponds to the 7th word (0x0004) in the export ordinal table in the below hex dump of mfc42.dll. The start is A7 05.
59 CC 12 00 6B CC 12 00 A7 05 D1 05 D4 05 2E 06
02 00 03 00 04 00 05 00 4D 46 43 34 32 2E 64 6C
The calculations:
i = Search_ExportNamePointerTable("DllRegisterServer") = 7 - 1 = 6 // zero-based
offset = ExportOrdinalTable[6] = 4
SymbolRVA = ExportAddressTable[4] = ...
ordinal = OrdinalBase + offset = 5 + 4 = 9
NO, PE Export Directory Table's OrdinalBase field is NOT ignored!
The sample provided above (mfc42.dll) is a good one (since its Ordinal Base is not 1).
Here two remarks about this issue:
. the output of the Dump tool is correct as far as the ordinal field is concerned. It shows, that the Base field is 5. This means that, when importing an exported function from mfc42.dll by name, the computed offset in the Export Address Table will be x-5. The Microsoft specification Section 5.3 is correct.
. the output of the Dump tool is NOT correct as far as the Hint is concerned. Export Tables have NO Hint field, ONLY Import tables have a Hint field.
As a matter of fact, the Ordinal Base is applied NOT in the Ordinal Table BUT when retrieving the index of the Address Table!