Can a PDF-1.3 be a PDF/A? - pdf

I have a PDF document which is supposed to be PDF/A conform, but the metadata states that it is a PDF-1.3 document. Can a PDF-1.3 document be conform with the rules of PDF/A?
Note that the first version of PDF/A is based on PDF-1.4 - hence my confusion.
Thanks in advance!

The PDF/A-1 specification (ISO 19005 part 1) states
5.1 General
This part of ISO 19005 defines a file format for representing electronic documents known as “PDF/A-1.”
Conforming PDF/A-1 files shall adhere to all requirements of PDF Reference as modified by this part of
ISO 19005.
"PDF Reference" previously is defined as
4 Notation
...
For the purposes of this part of ISO 19005, references to the “PDF Reference” are to PDF Reference: Adobe
Portable Document Format, version 1.4, 3rd ed., as amended by Errata for PDF Reference, 3rd ed. [...]
Section 5.1 continues:
Neither the version number in the header of a PDF file nor the value of the Version key in the document catalog dictionary shall be used in determining whether a file is
in accordance with this part of ISO 19005.
As these are the only metadata that can state that it is a PDF-1.3 document, this statement of version MUST NOT be used in determining whether a file is PDF/A-1.
Thus, concerning your question:
stijndg> Can a PDF-1.3 document be conform with the rules of PDF/A?
Yes, it can.
It merely has to
adhere to all requirements of PDF Reference, version 1.4 and
adhere to the requirements of ISO 19005 part 1 of Level A conformance or Level B conformance.
Furthermore Section 5.1 recommends:
Features described in PDF specifications prior to Version 1.4 which are not explicitly
described in PDF Reference should not be used.
But "should" indicates that this is a recommendation, so if for some reason the use of such features cannot be prevented, this does not keep a PDF from being PDF/A conform.

Related

What the standard used by a "hybrid PDF file"?

I need to create Open and easily-readable "PDF with source-content" (also named PDF Hybrid) by software tool like Prince or PDFreactor... This DocumentFoundation's FAQ explain what it is a PDF Hybrid, but not say what standard is using:
PDF/A-3 of ISO 19005-3?
Other (simplest?) ISO 19005 feature?
Something of the ISO 32000? (something as embedded files?)
Detecting the standard of a known hybrid PDF file would be a good alternative approach... But some people say that it is impossible to detect the standard used in the PDF Hybrid file.
First of all, this Hybrid PDF appears not to be specified in an independent standard, i.e. there is no corresponding ISO/ETSI/ANSI/... standard for it.
That being said, it apparently is in particular a prominent feature of LibreOffice PDF exports:
(from the LibreOffice Writer FAQ on hybrid PDFs)
Inspecting such a file (e.g. this one) one sees that there are additional entries in the PDF trailer:
...
trailer
<</Size 128/Root 126 0 R
/Info 127 0 R
/ID [ <518EBB4C2FE2F6B638478335A7ED9CA4>
<518EBB4C2FE2F6B638478335A7ED9CA4> ]
/DocChecksum /7B00A6EE0349EB2EA1DFB5ECC5899A7C
/AdditionalStreams [/application#2Fvnd#2Eoasis#2Eopendocument#2Etext 66 0 R
]
>>
startxref
291605
%%EOF
and that referenced additional stream in object 66 indeed contains the source OpenOffice document.
Apparently applications supporting these hybrid PDF files inspect the value of that AdditionalStreams trailer entry, and if they know to handle the given document type (/application#2Fvnd#2Eoasis#2Eopendocument#2Etext here corresponds to application/vnd.oasis.opendocument.text), they provide a way to extract that embedded document and open it for editing.
Beware: Unless I overlooked some ISO norm, those extra entries strictly speaking are forbidden by the PDF specification ISO 32000: In the trailer there may only be entries with keys that are either defined for the trailer in ISO specifications or any second-class names. Neither AdditionalStreams nor DocChecksum are ISO specified or second class. Thus, strictly speaking those hybrid PDFs are invalid PDFs.

What is the minimum PDF version for PAdES?

I'm generating a PDF file with puppeteer. It automatically generates a PDF-1.4 file. Then, I use dss to digitally sign it with a PAdES signature. The resulting file can be opened in PDF viewers and PDFStudio seems to correctly parse the document signature.
Is this valid however?
Wikipedia states that the PDF/A-2 (which is based on 1.7) added support for PAdES.
Do I need to generate at least PDF-1.7 (or PDF/A-2) to have a valid PDF file with a valid signature?
Note: I use the term valid in both the technical and legal terms.
The PAdES norm ETSI EN 319 142 characterizes
PAdES signatures profiled in the present document build on PDF signatures specified in ISO 32000-1 with an alternative signature encoding to support digital signature formats equivalent to the signature format CAdES.
The PDF norm ISO 32000-1 characterizes
ISO 32000 specifies a digital form for representing documents called the Portable Document Format or usually referred to as PDF. PDF was developed and specified by Adobe Systems Incorporated beginning in 1993 and continuing until 2007 when this ISO standard was prepared. The Adobe Systems version PDF 1.7 is the basis for this ISO 32000 edition. The specifications for PDF are backward inclusive, meaning that PDF 1.7 includes all of the functionality previously documented in the Adobe PDF Specifications for versions 1.0 through 1.6. It should be noted that where Adobe removed certain features of PDF from their standard, they too are not contained herein.
(This may sound a bit confusing, on one hand PDF 1.7 includes all of the functionality previously documented in the Adobe PDF Specifications for versions 1.0 through 1.6, on the other hand Adobe removed certain features of PDF from their standard. Indeed, some features were removed but I don't believe your PDF 1.4 files are affected.)
Thus, a PDF file like yours claiming a version 1.4 also is a PDF 1.7 by backward inclusiveness and as such can get signed by PAdES signatures.
Thus, yes, PDF 1.4 files can (technically) validly be signed with PAdES signatures. (Unless, obviously, your files explicitly disallow this.)
(Actually one can also view PAdES signatures as adopted in ISO 32000-2; in this case your PDF 1.4 files by backward inclusiveness are also PDF 2.0 and as such can be signed using PAdES signatures as specified there.)
You also enquire about legal aspects. First of all, I am not a lawyer, so don't consider this formal legal consultation.
To start with, though, you have to make clear in which legal system you want to investigate legal validity.
While PAdES originally has been defined in the context of European Union signature regulations, a number of other countries also adopted PAdES as standard for their preferred PDF signatures.
So: Are you wondering about validity as signatures in the context of EU eIDAS signatures? Are you considering specific regulations of EU member states? Or are you wondering about the situation in other countries outside the EU?
In the EU your PAdES signatures should be generally accepted. Even though there may be some member state special regulations in specific contexts, they should only influence your choice of the PAdES profile you request for your signatures from DSS, they should not render your PDF 1.4 source PDFs unusable for PAdES signing.
I don't know specifics about non-EU legal systems with a PAdES preference. But I indeed would be surprised if any would be bothered by your PDFs being PDF 1.4.
In comments the question arose whether the signed file is also still a valid PDF 1.4 and if not whether the version 1.4 in the file header would be a concern.
Obviously PDF 1.4 does not know the details of the PAdES signature encodings. Fortunately, though, the PDF Reference 1.4 actually does not know any specific signature encoding at all! Thus, no signature encoding is invalid as long as it follows the very few rules present in the PDF 1.4 reference, and PAdES signatures do so.
Furthermore, the PDF 1.4 reference allows
A PDF producer or Acrobat plug-in extension may also add keys to any PDF object that is implemented as a dictionary, except the file trailer dictionary (see Section 3.4.4, “File Trailer”).
Thus, any keys added while applying the PAdES signature which are not defined in PDF 1.4 are harmless.
Thus, the PDF 1.4 files with PAdES signatures added are also still valid PDF 1.4 files. Obviously, though, a plain PDF 1.4 viewer does not know how to validate the PAdES signatures. But as it does not know how to validate any signatures at all, that's of no concern.

Digital Signature in PDF doesn't verify as matching after adding annotations

I was going through the official PDF spec. I came across a digitally signed PDF here. While I was analyzing its catalog dictionary, I saw this:
The digital signature is in the form of a signature field, which specifies the byte range of the content to which the signature applies. Any content added on top of it, like annotation, notes, etc. should go in as incremental updates, so the validity of the original content should continue to hold true (excluding direct editing of the content, like changing the Sample word to Sample2). However, when I open the file in Nitro, add some highlight or notes to it, save it and open it in Acrobat, it now says that the signature is invalid. Which brings me to my questions:
Why is Acrobat showing it as invalid? The signature field does not enforce prevention from adding incremental updates, why exactly is it invalid?
Why is Acrobat not allowing addition of notes or highlights? Nitro allows it, for example. There is no Perms dictionary which would specify a DocMDP level restriction, so what exactly it is that Adobe is interpreting as a document level lock?
As already explained in my answer to your previous question on this topic, the file you call "the official PDF spec" is everything but. The official PDF specification is ISO 32000-1 (since 2008) and ISO 32000-2 (the 2017 update).
That answer also points out the origin of the P entry in the FieldMDP transform dictionary your sreenshot shows:
It comes from the Lock dictionary of the same signature dictionary and is defined in Adobe supplement to ISO 32000, extension level 3, (which being from Adobe unfortunately indeed references the PDF Reference 1.7 instead of ISO 32000-1):
P number *(Optional; Extension Level 3) The access permissions granted for this document. Valid values follow:
1, no changes to the document are permitted; any change to the document invalidates the signature.
This extension to ISO 32000-1 has been added to the standard ISO 32000-2.
Thus,
Why is Acrobat showing it as invalid? The signature field does not enforce prevention from adding incremental updates, why exactly is it invalid?
Because it does enforce prevention of any change, see above.
Why is Acrobat not allowing addition of notes or highlights? Nitro allows it, for example. There is no Perms dictionary which would specify a DocMDP level restriction, so what exactly it is that Adobe is interpreting as a document level lock?
Because Nitro (at least the version you tested) does probably merely support ISO 32000-1 but not Adobe's extension 3 to it let alone ISO 32000-2.

PDF File header sequence: Why '25 e2 e3 cf d3' bits stream used in many document?

I know that inform to a reader whether the pdf contains binary or not.
But why "25 e2 e3 cf d3" not random binary? Because so many document has that.
Is it Just because, so many use same pdf library ?
Refs:
PDF format. function of %-started sequence
comp.text.pdf>pdf format
Looking through the PDFs I have here it looks like a number of PDF processors use these very letters "%âãÏÓ", among them Adobe products.
Not all of those processors use the same basic PDF library, so the use of the same letters cannot be explained by something like that.
Most likely it is due to the fact that Adobe software creates PDFs with that second line comment. For many years developers of other software used example files produced by Adobe software as templates for the PDFs they created.
Yes, the specification ISO 32000-1 merely requires
If a PDF file contains binary data, as most do (see 7.2, "Lexical Conventions"), the header line shall be immediately followed by a comment line containing at least four binary characters—that is, characters whose codes are 128 or greater.
(and the earlier PDF references also recommend the same), so there is no need to use the same binary characters.
But there also is no reason not to use them. Why deviate from the working example files produced by Adobe software in this regard?
Especially in the years before the ISO specification, when there only were the PDF references, one tended to be as Adobe-like as possible in the document structure created as the PDF references were not considered normative in nature by Adobe. Thus, if your document was valid by the references, Adobe viewers could still reject it without that counting as a bug...

PDF and PAdES (PDF/A-2 and PDF 2.0)

PDF/A-2
PDF/A-2 carries over provisions from the ETSI/PadES standard. (source)
Is PAdES mandatory for PDF/A-2? Or are other signature formats in PDF/A-2 also allowed?
PDF 2.0
ETSI will feed these European-specific elements back into ISO for
inclusion in the next release of the PDF standard, ISO 32000-2. (source)
Will PAdES be mandatory in the new PDF 2.0 standard? Or will PDF 2.0 be compatible to PAdES and other signature formats will be allowed, too?
Is PAdES mandatory for PDF/A-2? Or are other signature formats in PDF/A-2 also allowed?
PAdES is not mandatory. In particular the name "PAdES" is not even mentioned in the PDF/A-2 specification. Merely some requirements have parallels to PAdES requirements, e.g.
When computing the digest for the file, it shall be computed over the entire file, including the signature
dictionary but excluding the PDF Signature itself. This range is then indicated by the ByteRange entry
of the signature dictionary.
This is also a requirement introduced by PAdES signatures, in ISO 32000-1 this merely was a recommendation. De facto, though, this had been made a requirement by Adobe Reader long before.
Although ISO 32000-1 also allows the value of the Contents entry of signature dictionary to be a DER-encoded
PKCS#1 binary data object, that format is not recommended.
This effectively recommends the use of PKCS#7/CMS signature container based PDF signatures. This also parallels PAdES which actually goes a step further and requires it. But naked PKCS#1 object based PDF signatures are not fashionable anyway, not even in the plain PDF world.
ISO 32000-1:2008 allows the inclusion of one or more RFC 3281 attribute certificates to be associated
with the signer certificate. However, a conforming writer should not include them as they are not widely
supported and hence use of this attribute will reduce interoperability.
There is a similarly formulated recommendation in PAdES part 2 (Basic Profile). In PAdES part 3 (PAdES-BES
and PAdES-EPES Profiles) attribute certificates are even forbidden. But nowadays attribute certificates in general are not in fashion anymore.
Thus, there are parallels but that's it.
Will PAdES be mandatory in the new PDF 2.0 standard? Or will PDF 2.0 be compatible to PAdES and other signature formats will be allowed, too?
As PDF 2.0 has not yet been published, this strictly speaking is speculation.
In the last draft I could read, though, PAdES signatures have been added to the existing formats and only adbe.pkcs7.sha1 signatures have been deprecated. As SHA1 (which is used in that format at least once) has forfeited trust, this format should not be used anyways, even in current PDFs.
Thus, neither PDF/A-2 nor ISO 32000-2 enforce PAdES style signatures, adbe.pkcs7.detached style signatures are still valid options. If interoperability and long term validation features are of interest, though, PAdES most likely is the better choice.