pdfbox - add visual signature. COSObject cast error - pdf

In org.apache.pdfbox.pdmodel.interactive.digitalsignature.SignatureOptions have setVisualSignature method. I can create a visual signature from some other pdf stream that has a visual signature appearance (to copy the appearance).
1) I created a signature appearance pdf, and using the setVisualSignature() method, I manage to copy the visual signature. Everything works;
2) I change the visual signature (change image) from PDFBox. To get COSObject:
Iterator<Entry<COSObjectKey, Long>> xrefEntriesIt = doc.getDocument()
.getXrefTable().entrySet().iterator();
while (xrefEntriesIt.hasNext()) {
COSObject object = doc.getDocument().getObjectFromPool(
xrefEntriesIt.next().getKey());
if (object.getDictionaryObject(COSName.SUBTYPE) == COSName.IMAGE) {
changeImage(object, doc);
}
}
and to change Image:
private static void changeImage(COSObject obj, PDDocument doc) {
PDXObjectImage imageInPdf =
(PDXObjectImage) PDXObject.createXObject((COSStream) obj.getObject());
File inputFile = new File("/new_SIGNATURE_IMG.jpg");
PDXObjectImage newImage = new PDJpeg(doc, new FileInputStream(inputFile));
imageInPdf.getCOSStream().replaceWithStream(newImage.getCOSStream());
doc.save("/new.pdf");
}
Everything works.
3) When I call setVisualSignature() method with the new pdf and with the new appearance image (that I change with my code), I have that error:
Exception in thread "main" java.lang.ClassCastException:
org.apache.pdfbox.cos.COSObject cannot be cast to
org.apache.pdfbox.cos.COSDictionary at
org.apache.pdfbox.pdmodel.PDDocument.addSignature(PDDocument.java:474)
Thats samples
What happens? Do I change images incorrectly?

The difference between template.pdf and CHANGED_TEMPLATE.pdf is that the signature field dictionary in the former one contains its appearance streams dictionary as a direct object:
9 0 obj
<< [...] /AP<</N 8 0 R>>>>
endobj
while in the latter one the appearance streams dictionary is an indirect object only referenced from the signature field dictionary:
5 0 obj
<<
[...]
/AP 10 0 R
>>
[...]
10 0 obj
<<
/N 15 0 R
>>
This is perfectly ok, the PDF specification does not require it to be direct in general:
AP dictionary (Optional; PDF 1.2) An appearance dictionary specifying how the annotation shall be presented visually on the page (see 12.5.5, “Appearance Streams”). Individual annotation handlers may ignore this entry and provide their own appearances.
(Table 164 in ISO 32000-1:2008)
Unfortunately the code where the exception occurs, i.e. the PDDocument method addSignature` in line 474, looks like this:
PDAppearanceDictionary ap =
new PDAppearanceDictionary((COSDictionary)cosBaseDict.getItem(COSName.AP));
Thus, PDFBox here expects the /AP value to be a direct dictionary object, not some reference to an indirect dictionary object.
I assume your first manipulation makes PDFBox rewrite the PDF in a way it assumes to be best (which seems to include making dictionaries indirect objects), and then PDFBox has other expectations...
If you made your first manipulation as an incremental update instead of a complete rewrite, PDFBox might leave the appearances dictionary untouched.

Related

Pdf signature invalidates existing signature in Acrobat Reader

I'm using iText 7.1.15 and SignDeferred to apply signatures to pdf documents.
SignDeferred is required since the signature is created PKCS11 hardware token (usb key).
When i sign a "regular" pdf, e.g. created via word, i can apply multiple signatures and all signatures are shown as valid in the adobe acrobat reader.
If the pdf was created by combining multiple pdf documents with adobe DC, the first signature is valid but becomes invalid as soon as the seconds signature is applied.
Document in Adobe reader after the first signature is applied:
Document in Adobe reader after the second signature is applied:
The signatures of the same document are shown as valid in foxit reader.
I've found a similar issue on stackoverflow (multiple signatures invalidate first signature in iTextSharp pdf signing), but it was using iText 5 and i'm not sure it is the same problem.
Question: What can i do in order to keep both signatures valid in the Acrobat Reader?
Unsigned Pdf document on which the first signature becomes invalid:
https://github.com/suntsu42/iTextDemoInvalidSecondSignature/blob/master/test.pdf
Twice signed document which is invalid:
https://github.com/suntsu42/iTextDemoInvalidSecondSignature/blob/master/InvalidDocumentSignedTwice.pdf
Code used for signing
//Step #1 >> prepare pdf for signing (Allocate space for the signature and calculate hash)
using (MemoryStream input = new MemoryStream(pdfToSign))
{
using (var reader = new PdfReader(input))
{
StampingProperties sp = new StampingProperties();
sp.UseAppendMode();
using (MemoryStream baos = new MemoryStream())
{
var signer = new PdfSigner(reader, baos, sp);
//Has to be NOT_CERTIFIED since otherwiese a pdf cannot be signed multiple times
signer.SetCertificationLevel(PdfSigner.NOT_CERTIFIED);
if (visualRepresentation != null)
{
try
{
PdfSignatureAppearance appearance = signer.GetSignatureAppearance();
base.SetPdfSignatureAppearance(appearance, visualRepresentation);
}
catch (Exception ex)
{
throw new Exception("Unable to set provided signature image", ex);
}
}
//Make the SignatureAttributeName unique
SignatureAttributeName = $"SignatureAttributeName_{DateTime.Now:yyyyMMddTHHmmss}";
signer.SetFieldName(SignatureAttributeName);
DigestCalcBlankSigner external = new DigestCalcBlankSigner(PdfName.Adobe_PPKLite, PdfName.Adbe_pkcs7_detached);
signer.SignExternalContainer(external, EstimateSize);
hash = external.GetDocBytesHash();
tmpPdf = baos.ToArray();
}
}
//Step #2 >> Create the signature based on the document hash
// This is the part which accesses the HSM via PCKS11
byte[] signature = null;
if (LocalSigningCertificate == null)
{
signature = CreatePKCS7SignatureViaPKCS11(hash, pin);
}
else
{
signature = CreatePKCS7SignatureViaX509Certificate(hash);
}
//Step #3 >> Apply the signature to the document
ReadySignatureSigner extSigContainer = new ReadySignatureSigner(signature);
using (MemoryStream preparedPdfStream = new MemoryStream(tmpPdf))
{
using (var pdfReader = new PdfReader(preparedPdfStream))
{
using (PdfDocument docToSign = new PdfDocument(pdfReader))
{
using (MemoryStream outStream = new MemoryStream())
{
PdfSigner.SignDeferred(docToSign, SignatureAttributeName, outStream, extSigContainer);
return outStream.ToArray();
}
}
}
}
}
Sample project
I've created a working sample project which uses a local certificate for signing. I also did update iText to version 7.2 but with the same result.
It also contains the document which cannot be signed twice (test.pdf)
https://github.com/suntsu42/iTextDemoInvalidSecondSignature/tree/master
Edit
I've applied the solution provided by MKL to the sample project on github.
As a second note, It is also possible to use PdfSigner but in this case, the bookmarks of the original document must be removed.
As already mentioned in a comment, the example document "InvalidDocumentSignedTwice.pdf" has the signature not applied in an incremental update, so here it is obvious that former signatures will break. But this is not the issue of the OP's example project. Thus, the issue is processed with an eye on the actual outputs of the example project.
Analyzing the Issue
When validating signed PDFs Adobe Acrobat executes two types of checks:
It checks the signature itself and whether the revision of the PDF it covers is untouched.
(If there are additions to the PDF after the revision covered by the signature:) It checks whether changes applied in incremental updates only consist of allowed changes.
The former check is pretty stable and standard but the second one is very whimsical and prone to incorrect negative validation results. Like in your case...
In case of your example document one can simply determine that the first check must positively validate the first signature: The file with only one (valid!) signature constitutes a byte-wise starting piece of the file with two signatures, so nothing can have been broken here.
Thus, the second type of check, the fickle type, must go wrong in the case at hand.
To find out what change one has to analyze the changes done during signing. A helpful fact is that doing the same using iText 5 does not produce the issue; thus, the change that triggered the check must be in what iText 7 does differently than iText 5 here. And the main difference in this context is that iText 7 has a more thorough tagging support than iText 5 and, therefore, also adds a reference to the new signature field to the document structure tree.
This by itself does not yet trigger the whimsical check, though, it only does so here because one outline element refers to the parent structure tree element of the change as its structure element (SE). Apparently Adobe Acrobat considers the change in the associated structure element as a change of the outline link and, therefore, as a (disallowed) change of the behavior of the document revision signed by the first signature.
So is this an iText error (adding entries to the structure tree) or an Adobe Acrobat error (complaining about the additions)? Well, in a tagged PDF (and your PDF has the corresponding Marked entry set to true) the content including annotations and form fields is expected to be tagged. Thus, addition of structure tree entries for the newly added signature field and its appearance not only should be allowed but actually recommended or even required! So this appears to be an error of Adobe Acrobat.
A Work-Around
Knowing that this appears to be an Adobe Acrobat bug is all well and good, but at the end of the day one might need a way now to sign such documents multiple times without current Adobe Acrobat calling that invalid.
It is possible to make iText believe there is no structure tree and no need to update a structure tree. This can be done by making the initialization of the document tag structure fail. For this we override the PdfDocument method TryInitTagStructure. As the iText PdfSigner creates its document object internally, we do this in an override of the PdfSigner method InitDocument.
I.e. instead of PdfSigner we use the class MySigner defined like this:
public class MySigner : PdfSigner
{
public MySigner(PdfReader reader, Stream outputStream, StampingProperties properties) : base(reader, outputStream, properties)
{
}
override protected PdfDocument InitDocument(PdfReader reader, PdfWriter writer, StampingProperties properties)
{
return new MyDocument(reader, writer, properties);
}
}
public class MyDocument : PdfDocument
{
public MyDocument(PdfReader reader, PdfWriter writer, StampingProperties properties) : base(reader, writer, properties)
{
}
override protected void TryInitTagStructure(PdfDictionary str)
{
structTreeRoot = null;
structParentIndex = -1;
}
}
Using MySigner for signing documents iText won't add tagging anymore and so won't make Adobe Acrobat complain about new entries in the structure tree.
The Same Work-Around in Java
As I feel more comfortable working in Java, I analyzed this and tested the work-around in Java.
Here this can be put into a more closed form (maybe it also can for C#, I don't know), instead of initializing the signer like this
PdfSigner pdfSigner = new PdfSigner(pdfReader, os, new StampingProperties().useAppendMode());
we do it like this:
PdfSigner pdfSigner = new PdfSigner(pdfReader, os, new StampingProperties().useAppendMode()) {
#Override
protected PdfDocument initDocument(PdfReader reader, PdfWriter writer, StampingProperties properties) {
return new PdfDocument(reader, writer, properties) {
#Override
protected void tryInitTagStructure(PdfDictionary str) {
structTreeRoot = null;
structParentIndex = -1;
}
};
}
};
(MultipleSignaturesAndTagging test testSignTestManuelTwiceNoTag)
TL;DR
iText 7 adds a structure element for the new signature field to the document structure tree during signing. If the parent node of this new node is referenced as associated structure element of an outline element, though, Adobe Acrobat incorrectly considers this a disallowed change. As a work-around one can tweak iText signing to not add structure elements.
I've got a similar problem with signatures after last update of Adobe Reader. I wrote a post on their community, but they still didn't answer me.
Take a look:
https://community.adobe.com/t5/acrobat-reader-discussions/invalid-signatures-after-adobe-reader-update-2022-001-20085/td-p/12892048
I am using a iText v.5.5.5 to generate pdf. I sign and certify pdf document in single method. Moreover, foxit reader shows that signatures are valid. I believe that this is an Adobe bug and it will be fixed soon :) Key are an log.

Using Core Graphics to change an images colorspace profile using AppleScriptObjC

I already have some code that will do most of what I need using NSIMage and NSColorSpace. Unfortunatly I am trying to recreate a colorspace/profile change that happens in Photoshop, and it is a bit more complex than what NSColorSpace can do. You can see that post here:
Using ApplescriptObjC to convert color spaces of an image using NSColorSpace and iccProfileData
So what I need help with is either adding in the following from CGColorSpace or recreating certain parts of the script so they work from the start with Core Graphics. The functions that I am looking to accomplish are:
CGColorRenderingIntent using kCGRenderingIntentPerceptual
kCGColorConversionBlackPointCompensation
Plus using dithering as a part of this color space conversion, but I can't seem to find an option for that in the Apple Objective-C documentation.
NSColor does have NSColorRenderingIntentPerceptual but it does not seem like there is the BlackPointCompensation under NSColor.
I think I have identified all the parts I need to build this script. I think the script is partway written already. I just need some help gluing the last few bits together.
I believe the script will still need to open the profile into NSData (The file is POSIX file reference to the ICC Profile that I am using)
set theData to current application's NSData's dataWithContentsOfFile:theFile
Now I need to open the image, my hope that this is the same whether using NSColor or CGColor:
set theInput to (choose file with prompt "Choose RGB file")
set theOutput to (choose file name default name "Untitled.jpg")
set theImage to current application's NSImage's alloc()'s initWithContentsOfURL:theInput
set imageRep to theImage's representations()'s objectAtIndex:0
Here is what I see the line of code that I need the most help with. This is actually where the color conversion is happening with NSColorSpace:
set targetSpace to current application's NSColorSpace's alloc's initWithICCProfileData:theData
It seems like I should be using CGColorSpaceCreateICCBased with CGDataProviderRef and then theFile, but I doubt that I can just put those in place of the NSColorSpace and initWithICCProfileData. I also need to graft onto this line, or a new line, the CGColorRenderingIntent using kCGRenderingIntentPerceptual and kCGColorConversionBlackPointCompensation (With dither if that option even exists).
I am not sure if the next two lines need to be updated, but I am pretty sure that the third line can stay the same (or I am really stupid, forgive me).
set theProps to current application's NSDictionary's dictionaryWithObjects:{1.0, true} forKeys:{current application's NSImageCompressionFactor, current application's NSImageProgressive}
set jpegData to bitmapRep's representationUsingType:(current application's NSJPEGFileType) |properties|:theProps
jpegData's writeToURL:theOutput atomically:true
So the input would be an RGB with an generic sRGB profile file and the output would be a CMYK file with a specific CMYK Profile (GRACoL2013_CRPC6.icc to be exact).
The input would be an RGB with an generic sRGB profile file and the output would be a CMYK file with a specific CMYK Profile (GRACoL2013_CRPC6.icc)
If this accurately summarises the objective, you ought to be able to do this using Image Events, which is an AppleScriptable faceless program to manipulate images.
Played around with Image Events, but embedding a new colour profile—which ought to be possible—doesn't appear to take, and the original colour profile remains.
So I wrote the AppleScriptObjC equivalent:
use framework "Foundation"
use framework "AppKit"
use scripting additions
property this : a reference to the current application
property nil : a reference to missing value
property _1 : a reference to reference
property NSBitmapImageRep : a reference to NSBitmapImageRep of this
property NSColorSpace : a reference to NSColorSpace of this
property NSData : a reference to NSData of this
property NSImage : a reference to NSImage of this
property NSString : a reference to NSString of this
property NSURL : a reference to NSURL of this
property JPEG : a reference to 3
property PNG : a reference to 4
property NSFileType : {nil, nil, "jpg", "png"}
property options : {NSImageCompressionFactor:0.75, NSImageProgressive:true ¬
, NSImageColorSyncProfileData:a reference to iccData}
property NSColorRenderingIntent : {Default:0, AbsoluteColorimetric:1 ¬
, RelativeColorimetric:2, Perceptual:3, Saturation:4}
--------------------------------------------------------------------------------
# IMPLEMENTATION:
set iccProfile to loadICCProfile("~/Path/To/GRACoL2013_CRPC6.icc")
set image to _NSImage("~/Path/To/SourceImage.jpg")
set path of image to "~/Path/To/OutputImage.jpg" -- omit to overwrite source
set iccData to iccProfile's space's ICCProfileData()
my (write image for iccProfile given properties:contents of options)
--------------------------------------------------------------------------------
# HANDLERS & SCRIPT OBJECTS:
# __NSURL__()
# Takes a posix +filepath and returns an NSURL object reference
to __NSURL__(filepath)
local filepath
try
NSURL's fileURLWithPath:((NSString's ¬
stringWithString:filepath)'s ¬
stringByStandardizingPath())
on error
missing value
end try
end __NSURL__
# new()
# Instantiates a new NSObject
on new(_nsObject)
local _nsObject
_nsObject's alloc()
end new
# _NSImage()
# Creates a new NSImage instance with image data loaded from the +filepath
on _NSImage(filepath)
local filepath
script
property file : __NSURL__(filepath)
property data : new(NSImage)
property kind : JPEG
property path : nil -- write path (nil = overwrite source)
property size : nil
property name extension : NSFileType's item kind
to init()
my (data's initWithContentsOfURL:(my file))
end init
to lock()
tell my data to lockFocus()
end lock
to unlock()
tell my data to unlockFocus()
end unlock
end script
tell the result
init()
set its size to its data's |size|() as list
return it
end tell
end _NSImage
# ICCProfile()
# Loads a ColorSync profile from the +filepath and creates a new NSColorSpace
# instance
to loadICCProfile(fp)
local fp
script
property file : __NSURL__(fp)
property data : NSData's dataWithContentsOfURL:(my file)
property space : new(NSColorSpace)
property mode : NSColorRenderingIntent's Perceptual
to init()
(my space)'s initWithICCProfileData:(my data)
end init
end script
tell the result
init()
return it
end tell
end loadICCProfile
# write
# Writes out the +_NSImage data optionally converting it to a new colorspace
to write _NSImage for ICC : missing value ¬
given properties:(opt as record) : missing value
local _NSImage, ICC, kind, path, options
set ImageRep to new(NSBitmapImageRep)
_NSImage's lock()
ImageRep's initWithFocusedViewRect:{{0, 0}, _NSImage's size}
ImageRep's bitmapImageRepByConvertingToColorSpace:(ICC's space) ¬
renderingIntent:(ICC's mode)
result's representationUsingType:(_NSImage's kind) |properties|:opt
set ImageRep to the result
_NSImage's unlock()
set fURL to __NSURL__(_NSImage's path)
if fURL = missing value then set fURL to NSImage's file
set ext to _NSImage's name extension
if fURL's pathExtension() as text ≠ ext then ¬
fURL's URLByDeletingPathExtension()'s ¬
URLByAppendingPathExtension:ext
ImageRep's writeToURL:fURL atomically:yes
if the result = true then return fURL's |path|() as text
false
end write
---------------------------------------------------------------------------❮END❯
As you noted, there doesn't appear to be an equivalent Foundation class option for the Core Graphics' kCGColorConversionBlackPointCompensation when converting colour spaces. So I may not have provided you with anything script-wise that you weren't already able to do. What I did observe, however, is that the GRACoL colour profiles cause the AppleScript engine to crash if one tries to utilise them "as is" after obtaining them from the website. For whatever reason, the profile must first be opened in the ColorSync Utility.app, and then saved (Save As...) or exported (Export...). Overwriting the original file is fine, after which AppleScript appears content to use it. This doesn't appear to be an issue with other profiles already saved on the system.

How to get PDF page location without creating new array

Is it possible just to find out locations of PDF pages in byte array?
At the moment I parse full PDF in order to find out page bytes:
public static List<byte[]> splitPdf(byte[] pdfDocument) throws Exception {
InputStream inputStream = new ByteArrayInputStream(pdfDocument);
PDDocument document = PDDocument.load(inputStream);
Splitter splitter = new Splitter();
List<PDDocument> PDDocs = splitter.split(document);
inputStream.close();
List<byte[]> pages = PDDocs.stream()
.map(PDFUtils::getResult).collect(Collectors.toList());
}
private static byte[] getResult(PDDocument pd) {
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
pd.save(byteArrayOutputStream);
return byteArrayOutputStream.toByteArray();
}
My code works very well but
I created additional List< byte[] > to save page bytes. I would like just to have byte locations - If I know byte indexes of page (page start location, page end location) I'll extract this from main byte array.
So might be I can find this information in PDF header or somewhere...
Right now I'm trying to optimize memory, because I parse hundreds of documents in parallel. So I don't want to create duplicate arrays.
If I know byte indexes of page (page start location, page end location) I'll extract this from main byte array.
As #Amedee already hinted at in a comment, there is not simply a section of the pdf for each page respectively.
A pdf is constructed from multiple objects (content streams, font resources, image resources,...) and two pages may use the same objects (e.g. use the same fonts or images). Furthermore, a pdf may contain unused objects.
So already the sum of the sizes of your partial pdfs may be smaller than, greater than, or even equal to the size of the full pdf.

Confused by CaretOffset/LanguageItem methods

I am trying to find out over which source file element the cursor is located (code is inside a pad)
//Obtain document
Document sf = IdeApp.Workbench.ActiveDocument;
//out argument
DocumentRegion dr;
//Call using offset
Microsoft.CodeAnalysis.ISymbol o = sf.GetLanguageItem(sf.Editor.CaretOffset , out dr);
The ISymbol returned "o" is Object's Equals. The document sf is a simple class with a parameterless constructor. The cursor is inside the constructor. I was expecting my class constructor.
Where is the error?
Ok. I found a work around to get context data out of the current editor caret offset. It requires to obtain AnalysisDocument from the current document, then the SemanticModel of the document and after obtaining this model, calling GetEnclosingSymbol with the caret offset.

iTextSharp: Convert PdfObject to PdfStream

I am attempting to pull some font streams out of a pdf file (legality is not an issue, as my company has paid for the rights to display these documents in their original manner - and this requires a conversion which requires the extraction of the fonts).
Now, I had been using MUTool - but it also extracts the images in the pdf as well with no method for bypassing them and some of these contain 10s of thousands of images. So, I took to the web for answers and have come to the following solution:
I get all of the fonts into a font dictionary and then I attempt to convert them into PdfStreams (for flatedecode and then writing to files) using the following code:
PdfDictionary tg = (PdfDictionary)PdfReader.GetPdfObject((PdfObject)cItem.pObj);
PdfName type = (PdfName)PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE));
try
{
int xrefIdx = ((PRIndirectReference)((PdfObject)cItem.pObj)).Number;
PdfObject pdfObj = (PdfObject)reader.GetPdfObject(xrefIdx);
PdfStream str = (PdfStream)(pdfObj);
byte[] bytes = PdfReader.GetStreamBytesRaw((PRStream)str);
}
catch { }
But, when I get to PdfStream str = (PdfStream)(pdfObj); I get the error below:
Unable to cast object of type 'iTextSharp.text.pdf.PdfDictionary'
to type 'iTextSharp.text.pdf.PdfStream'.
Now, I know that PdfDictionary derives from (extends) PdfObject so I am uncertain as to what I am doing incorrectly here. Someone please help - I either need advice on patching this code, or if entirely incorrect, either code to extract the stream properly or direction to a place with said code.
Thank you.
EDIT
My revised code is here:
public static void GetStreams(PdfReader pdf)
{
int page_count = pdf.NumberOfPages;
for (int i = 1; i <= page_count; i++)
{
PdfDictionary pg = pdf.GetPageN(i);
PdfDictionary fObj = (PdfDictionary)PdfReader.GetPdfObject(res.Get(PdfName.FONT));
if (fObj != null)
{
foreach (PdfName name in fObj.Keys)
{
PdfObject obj = fObj.Get(name);
if (obj.IsIndirect())
{
PdfDictionary tg = (PdfDictionary)PdfReader.GetPdfObject(obj);
PdfName type = (PdfName)PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE));
int xrefIdx = ((PRIndirectReference)obj).Number;
PdfObject pdfObj = pdf.GetPdfObject(xrefIdx);
if (pdfObj == null && pdfObj.IsStream())
{
PdfStream str = (PdfStream)(pdfObj);
byte[] bytes = PdfReader.GetStreamBytesRaw((PRStream)str);
}
}
}
}
}
}
However, I am still receiving the same error - so I am assuming that this is an incorrect method of retrieving font streams. The same document has had fonts extracted using muTool successfully - so I know the problem is me and not the pdf.
There are at least two things wrong in your code:
You cast an object to a stream without performing this check: if (pdfObj == null && pdfObj.isStream()) { // cast to stream } As you get the error message that you're trying to cast a dictionary to a stream, I'm 99% sure that the second part of the check will return false whereas pdfObj.isDictionary() probably returns true.
You try extracting a stream from PdfReader and you're trying to cast that object to a PdfStream instead of to a PRStream. PdfStream is the object we use to create PDFs, PRStream is the object used when we inspect PDFs using PdfReader.
You should fix this problem first.
Now for your general question. If you read ISO-32000-1, you'll discover that a font is defined using a font dictionary. If the font is embedded (fully or partly), the font dictionary will refer to a stream. This stream can contain the full font information, but most of the times, you'll only get a subset of the glyphs (because that's best practice when creating a PDF).
Take a look at the example ListFontFiles from my book "iText in Action" to get a first impression of how fonts are organized inside a PDF. You'll need to combine this example with ISO-32000-1 to find more info about the difference between FONTFILE, FONTFILE2 and FONTFILE3.
I've also written an example that replaces an unembedded font with a font file: EmbedFontPostFacto. This example serves as an introduction to explain how difficult font replacement is.
Please go to http://tinyurl.com/iiacsCH16 if you need the C# version of the book samples.