Metadata for digital signature - cryptography

Is possible to append to PKCS #7 or other formats of digital signature data that should not be signed? For example, I have a signature of the invoice document and I want to append some metadata, such as email or user after signing, after signing.

Yes it is possible. Look for unsigned attributes as a keyword, e.g. in https://www.rfc-editor.org/rfc/rfc3852#section-11.4. This is used to add counter signatures, timestamps, validation data and other stuff in various signature formats. These attributes are not included in hash calculation and can therefore be added and changed after signature creation.

Related

How do PDF readers validate form fields?

I was looking at the source code of several pdf files which were digitally signed (and had annotations and form fields as well).
I noticed that each "Annot" dictionary has a "M" value which stores the latest time it was modified - which can then be checked with the "M" value for the "Sig" dictionary which stores when the pdf file was digitally signed.
However, I noticed that dictionaries with type "XObject" and subtype "Form" do not have an "M" value - i.e. do not store the time at which said form was modified. In such cases, how do pdf readers validate whether a change to form field is allowed (for eg, in a digital sign where no changes are allowed, no form field can be changed after the digital sign is done - how is this verified?)?
I just attached an example pdf at this link:
https://www.mediafire.com/file/q8ed9rkf35kgxgq/output.txt/file
Some Misconceptions
There apparently are a number of misconceptions to clear up here.
I noticed that each "Annot" dictionary has a "M" value which stores the latest time it was modified - which can then be checked with the "M" value for the "Sig" dictionary which stores when the pdf file was digitally signed.
The M entry of annotation dictionaries is optional, so you cannot count on it being there.
The format of the annotation M value essentially only is a String; it is merely recommended to contain dates as specified in the PDF specification but not required, so you might find a value like "my mother's 42nd birthday" in it.
The annotation M value is not backed by a digital timestamp, so a forger could put anything there.
Furthermore, the M entry of a signature dictionary also is optional, and by itself it also cannot be trusted.
Thus, no, this represents no means to validate anything.
However, I noticed that dictionaries with type "XObject" and subtype "Form" do not have an "M" value - i.e. do not store the time at which said form was modified. In such cases, how do pdf readers validate whether a change to form field is allowed (for eg, in a digital sign where no changes are allowed, no form field can be changed after the digital sign is done - how is this verified?)?
First of all, as explained above, the M values cannot be used at all for modification detection, so whether some objects do or do not have them, is irrelevant.
Furthermore, a form XObject by itself is not a form field meant by the document modification detection and prevention settings of a signature. The form fields meant are AcroForm form fields (or, in a deprecated special case, XFA form fields). A form XObject may be used as appearance of an AcroForm form field but in that case the pivotal check point is the form field itself.
How to Validate Changes
(For some backgrounds on PDF signatures you may want to read this answer first.)
Depending on the document modification detection and prevention (DocMDP) settings of the signatures of a document only certain changes are allowed to a document, see this answer.
But even the allowed changes may not be applied by changing the original objects in the PDF. That would after all change the digest over the originally signed byte ranges and so invalidate the signature. Thus, the changed and added objects are appended to the PDF, capped off by a cross reference table or streams for these objects.
Thus, what you have to do for DocMDP validation, is determining the extend of the signed revision in the PDF file, finding out that way what has been appended, and analyzing whether those additions change the signed revision in allowed or disallowed ways.
While this may sound simple at first, it is not, in particular because "allowed" and "disallowed" changes are characterized by their effects on document appearance and behavior, not by the actual PDF objects that may be affected.
Here currently ETSI working groups are attempting to transform those characterizations into criteria for PDF objects; the results are to be published as ETSI TS 119 102-3, probably in multiple parts.
Some Details
In comments you asked
how do you tell from the appearance of a modified object, whether it was added before or after the digital sign?
Well, as mentioned above:
First you determine the extend of the signed revision in the PDF file.
I.e. you take the ByteRange entry of the signature dictionary and take the start of the lower range and the end of the higher range. E.g. if that entry is
[ 0 67355 72023 6380 ]
the the signed revision starts at offset 0 and ends at offset 72023+6280-1=78302 inclusively.
(Obviously some sanity checks are indicated, in particular that the start offset is 0, that the gap in the signed byte ranges exactly contains the signature dictionary Contents value, that the signed revision as a whole is a valid PDF and all its cross references point to indirect objects completely contained in that signed revision, and that that signed revision indeed is a previous revision of the whole PDF, i.e. that the chain of cross reference streams or tables contains the cross reference stream or table of that revision.)
Then you find out that way what has been appended.
I.e. you compare the cross reference stream or table of the whole file and the cross reference stream or table of the signed revision.
If some object is referenced for an object number now but was not referenced for that object number in the signed revision, you found a change to check.
(Strictly speaking you should iterate along the chain of cross reference streams or tables from the signed revision to the whole file, i.e. revision by revision, and check the changes in each revision.)
For this procedure you obviously have to use the original file, not some version uncompressed by tools like qpdf, otherwise you cannot do the offset tests.
Is it possible for an attacker to add a new annotation object before the xref table corresponding to the digital signature, and adjust the previous xref table values, so that a broken document passes as an accepted document?
No. The signed revision including its cross references is covered by the signature. Manipulating those bytes will invalidate the signature mathematically.

Decoding SCT extension in X.509 certificate

Im currently trying create some code to decode the SCT extension in a X.509 Certificate (OID: 1.3.6.1.4.1.11129.2.4.2). It has mostly been successful, however the SCT signature turns out wrong for me when comparing my result to OpenSSL. An example:
Mine:
37:F6:....:E2:16:....
OpenSSL:
30:45:02:20:37:F6:....:02:21:00:E2:16:....
To clarify the "...." are the same for both. For all my test cases, the same pattern of:
30:45:02:20 and 02:21:00 come up. When trying to interpret these additional octets my guess is that these represent: 0x30 -> SEQUENCE and 0x02 -> INTEGER and the following octets represents length etc. Should these be included in the signature or is there something i have missed?

How to convert from a ESSCertId certHash to a x509SKI?

In a RFC 3161 Timestamp Token, How can I convert the certHash of the ESSCertID into a x509SKI?
For example, in this timestamp token the certHash is hex-encoded as
5A25B4B5D82C19118C496917A4EA53309A859DDB
which converted to base-64 is
WiW0tdgsGRGMSWkXpOpTMJqFnds=
but I know from the EU Trusted List that the x509SKI should be
t0SsyqkWH9a3+KEASdguv/5fY8g=
The ESSCertID structure contains the SHA-1 hash of the whole certificate.
The Subject Key Identifier is any of:
The SHA-1 hash of the public key, as the content bytes of the SubjectPublicKeyInfo encoding.
ski = option_1[13..20]; ski[0] = 0x40 | (ski[0] & 0x0F)
The SHA-1 hash of the public key, as the fully encoded SubjectPublicKeyInfo
Anything else that the CA decides.
So there's not really a calculatable step to get from one to the other. Especially because of (4).
Certificate Transparency logs do help, though, since you can search them for certificates by many criteria. https://crt.sh/?q=5A25B4B5D82C19118C496917A4EA53309A859DDB (searching for the SHA-1 value from the ESSCertID) found the certificate, and its Subject Key Identifier is B7:44:AC:CA:A9:16:1F:D6:B7:F8:A1:00:49:D8:2E:BF:FE:5F:63:C8, which is the hex version of your base64 string.

Signature verification failure due to reordering via _asn1_ordering_set_of

I am using GnuTLS 3.4.1. I have a x509 certificate with set of sequences inside. The certificate is stored that way on a smart card.
GnuTLS is rearranging the sequences via function _asn1_ordering_set_of, which appears to be causing a verification failure.
Here's what the sequence looks like:
SEQUENCE :
...
SET :
SEQUENCE :
OBJECT_IDENTIFIER : 'CN (2.5.4.3)'
PrintableString : '0000'
SEQUENCE :
OBJECT_IDENTIFIER : 'SN (2.5.4.4)'
TeletexString : 'XXX'
SEQUENCE :
OBJECT_IDENTIFIER : 'G (2.5.4.42)'
TeletexString : 'YYY'
OpenSSL (and probably Java PKCS11 provider) loads this construction as is.
GnuTLS on load of the certificate sorts this construction in _asn1_ordering_set_of. So that it becomes:
SEQUENCE :
...
SET :
SEQUENCE :
OBJECT_IDENTIFIER : 'G (2.5.4.42)'
TeletexString : 'YYY'
SEQUENCE :
OBJECT_IDENTIFIER : 'SN (2.5.4.4)'
TeletexString : 'XXX'
SEQUENCE :
OBJECT_IDENTIFIER : 'CN (2.5.4.3)'
PrintableString : '0000'
Why does GnuTLS sort the set of sequences? What way should it be done, is it a GnuTLS bug or other libraries simply omit ordering?
RFC5280 has:
4.1. Basic Certificate Fields
The X.509 v3 certificate basic syntax is as follows. For signature
calculation, the data that is to be signed is encoded using the ASN.1
distinguished encoding rules (DER) [X.690].
So it seems to me that GnuTLS is doing the right thing.
It also looks like it's trying to encode a Distinguished Name, but does it wrong. It's valid according to the ASN1, because the spec is just really weird. You can have multiple values for each part. But you want the CN, SN and so on each in it's own SET, so all those SEQUENCEs should have had their own SET.
What way should it be done ...
The ITU recommends SET OF be encoded using BER, as long as there's no need for CER or DER. The best I can tell, there's no need. See below for a more detailed explanation in the realm of the ITU and ASN.1.
However, GnuTLS may be complying with a standard that creates the need. In this case, I'm not aware of which standard it is. See Kurt's answer.
I looked at RFC 5280, PKIX Certificate and CRL Profile, but I could not find the restriction. Maybe its in another PKIX document.
is it a GnuTLS bug or other libraries simply omit ordering?
I don't believe its a bug in GnuTLS per se. Its just the way the library does things. Take this modulo the requirement to do so in a RFC or other standard.
Also note that other libraries don't omit ordering. They use the order the attributes are presented in the certificate, which is an ordering :)
(comment) The problem is that GNUTLS rearranging results in failed SSL authentication
That sounds like a bug to me (modulo standards requirements). In this case, the bug is reordering the SET OF after a signature is placed upon the TBS/Certificate.
If GnuTLS is building the TBS/Certificate, then its OK to reorder until the signature is placed upon it.
(comment) Does GnuTLS put the elements of a SET OF type in the correct order according to DER rules
In ASN.1 encoding rules, X.690, BER/CER/DER:
8.12 Encoding of a set-of value
...
8.12.3 The order of data values need not be preserved by the encoding and subsequent decoding.
A SET OF does not appear to be ordered (for example, lexicographical order), so the sender can put them in any order, and a receiver can reorder them.
However, 11.6 says:
11 Restrictions on BER employed by both CER and DER
...
11.6 Set-of components
The encodings of the component values of a set-of value shall appear in ascending order, the encodings being compared as octet strings with the shorter components being padded at their trailing end with 0-octets.
NOTE – The padding octets are for comparison purposes only and do not appear in the encodings.
In the above, they are saying BER can be any order, but CER and DER are ascending order.
And last but not least, the Introduction says:
Introduction
...
... The basic encoding rules is more suitable than the canonical or distinguished encoding rules if the encoding contains a set value or set-of value and there is no need for the restrictions that the canonical and distinguished encoding rules impose ...
So the Introduction recommends BER for SET OF.
But in the big picture: the certificate is in BER. That's how it was signed. GnuTLS cannot change that once they get a hold of the certificate because of the signature over the certificate's data.
GnuTLS is free to create certificates in DER encoding. They just can't impose the encoding after the fact.
(comment) gnutls_certificate_set_x509_key_file(xcred, CERT_URL, KEY_URL, GNUTLS_X509_FMT_PEM);
I looked at the latest GnuTLS sources. That's appears to be the way its used in src/serv.c.
Apparently, _asn1_ordering_set_of was not working as expected in the past. It was improved in April, 2014. See PATCH 1/3: Make _asn1_ordering_set_of() really sort (and friends) on the GnuTLS mailing list.
Here are the hits for it in the sources:
$ grep -R -n _asn1_ordering_set_of * | grep -v doc
lib/minitasn1/coding.c:832: /* Function : _asn1_ordering_set_of */
lib/minitasn1/coding.c:843: _asn1_ordering_set_of (unsigned char *der, int der_len, asn1_node node)
lib/minitasn1/coding.c:1261: err = _asn1_ordering_set_of (der + len2, counter - len2, p);
The use around line 1261 is for asn1_der_coding. asn1_der_coding is used more frequently in other components...
(comment) but I'm not sure that it's bug in GNUTls and not on the server side, so I'd like to find out how it should work before doing anything
You should probably reach out ot the GnuTLS folks as detailed at B.3 Bug Reports. It looks like a bug in the processing of non-GnuTLS certificates.
To be clear, GnuTLS uses DER when it creates certificates and that's fine. But GnuTLS cannot impose ordering after it receives a non-GnuTLS certificate because that invalidates the signature.
Their test suite probably misses it because GnuTLS DER encodes SET OF. They likely are not aware its happening.

DSS, VRI - what is so mystructure?

I've just sign document using itext. I've LTV too.
I read in itext documentation - "The DSS contains references to certificates, and we can add
references to OCSP responses and CRLs that can be used to re-verify
the certificates"
yes, I fount them in my DSS.
I also read thet: "In the DSS, we can store VRI"
I dont understand why is VRI for? because there is the same OCSP responses and Certificates , which are in DSS.
Also wat does /61A2411B1..... means? is it some hash or Random number?
The structures you are interested in are defined in ETSI TS 102 778-4.
I dont understand why is VRI for? because there is the same OCSP responses and Certificates , which are in DSS.
While the Certs, OCSPs, and CRLs arrays in the DSS dictionary reference certificates, OCSP responses, and certificate revocation lists that may be used in the validation of any signatures in the document, the VRI dictionary contains Signature VRI dictionaries which reference the validation-related information for a single signature.
As your document contained but one signature, the information looks unnecessarily duplicated.
Also wat does /61A2411B1..... means? is it some hash or Random number?
The key of each entry in the VRI dictionary is the base-16-encoded (uppercase) SHA1 digest of the signature to which it applies.
PS: To clarify: key in the last sentence refers to the PDF structure: the VRI dictionary is a PDF dictionary in which a key (a PDF name object) is mapped to a value (another PDF object, in this case another PDF dictionary). It is not a cryptographic key from the signature...
Thus, you take the signature in question, calculate its SHA1 hash, write it in uppercase base-16-encoding, make that string as a PDF name, and then use that PDF name as PDF dictionary key.
Current latest ETSI specification is at https://www.etsi.org/deliver/etsi_en/319100_319199/31914201/01.01.01_60/en_31914201v010101p.pdf
You can leave the VRI out as its a duplicated data and actually not needed.
The VRI dictionary is optional, since all necessary data to validate
the signature can be available from other sources like the DSS
dictionary itself. The VRI dictionary offers possibilities for
optimization of the validation process, since it relates the data to
one specific signature.