Blog

All Blog Posts  |  Next Post  |  Previous Post

A note on digital signatures and PKCS11 tokens

Tuesday, February 20, 2024

TMS Cryptography Pack, or TMSCP in short, contains a set of cryptographic algorithms and specific classes to address data encryption, hashing, signature and authentication. This post provides an overview of cryptographic signatures of PDF documents with a token, that is usually in the form of a USB stick containing a microchip such as the ones that can be found in smart cards.
One of the reasons for such a post is the now widespread use of these tokens because of regulations on digital signature such as the EU eIDAS regulation [1].

These regulations impose very stringent rules on digital signatures and the recourse to tokens enforces the integrity of digital certificates and private keys that are required to sign and verify signatures of electronic documents.

To ensure interoperability of the signature and verification processes between providers and users at large, an entire machinery is used in the background, ranging from Public Key Infrastructures (PKIs) and standards such as X509v3 [2], PKCS#11 [3], PKCS12 [9], PEM [10], CMS [4], ASN.1 [5], and ETSI [6][7] documents. That's indeed a lot of information to absorb and going into details would require a book rather than a mere post. We will skim through some of these standards to shed some light on what TMSCP provides.

1) How does signing work

If signing and verifying a signature is now very common and sometimes done transparently with protocols such as TLS, background operations can be complicated.


There are several types of cryptographic protocols to sign and verify data and files. They all rely on the complexity of some operation being easy if you possess the appropriate information while close to impossible if you don't.

The most known and used protocol is RSA, by the name of his inventors, Ronald Rivest, Adi Shamir and Leonard Adleman. The security of RSA relies on the complexity of factoring large composite prime numbers. RSA is an asymmetric cryptographic protocol, meaning that different keys are use to encrypt and decrypt and to sign and verify. However, these two keys are related by a mathematical function and are therefore not independent from one another. If the key pair is properly generated, it is not possible to recover one key even when an adversary possesses the other.

For the signature process to function, a key, called private key, is used by the signatory (or sender) to sign, and the other key, called public key, is used by the recipient to verify the signature. The public key must be provided to the recipient in order to verify a signature. This can be achieved by a specific protocol to share keys or by a simple directory. The private key must be kept secret, otherwise anyone possessing this key can impersonate its owner and sign of his/her behalf. Also, the recipient needs to ascertain that the public key is from its legitimate owner. These issues, and many others, are addressed by Public Key Infrastructures.

2) Certificates

Generating keys in a proper way is a good start but not enough to ensure the overall interoperability and trust in the signing process. Users can exchange public keys manually (you may have heard of the PEM parties to do so), but this doesn't scale up and can be very cumbersome in a professional environment. Keys alone are not enough to ensure trust in a distributed environment when users don't know each other. Does a buyer know who is behind every online shop?

That's one of the reasons why keys are packaged into certificates and certificates are signed by a rcognised authority. There are several types of certificates as you may know with X509, PEM and PKCS12 being the most popular, for diverse reasons. Let's have a look at what an X509 certificate contains.

Certificate (X.509v3)
- Version Number
- Serial Number
- Signature Algorithm ID
- Issuer Name
- Validity period
    Not Before
    Not After
- Subject name
- Subject Public Key Info    
    Public Key Algorithm
    Subject Public Key
- Issuer Unique Identifier (optional)
- Subject Unique Identifier (optional)
- Extensions (optional)
    ...
- Certificate Signature Algorithm
- Certificate Signature


We can identify many expected fields, including the signature field by the authority who delivered the certificate. We also note optional extensions that can lead to potential problems if they are many.

Also we can see there is no private key in an X509 certificate. This key has to be stored elsewhere, either on the computer of the signing person or process (preferably in an encrypted form), or in a separate security container. A nice, and hard to compromise, security container is a specific purpose chip such as the ones in smartcards or SIM cards. If we store these chips in USB tokens, we then have a practical solution to carry certificates and private keys in a secure manner.

PEM certificates use a different approach and store private keys encrypted within the certificate. This is much cheaper than a USB token, but indeed much less secure.

3) Tokens

There is indeed a standard for USB tokens and their application programming interface (API), called Public Key Cryptographic Standard number 11, or in short PKCS#11. Tokens can be used for most cryptographic operations and can also perform most of them internally as they host various cryptographic algorithms on the internal chip. These algorithms are vendor dependant and may be limited in their scope due national regulations.

PKCS#11 tokens store, at least, certificates signed by the issuing authority and private keys generated according to national, local or company rules. These tokens require a driver, usually in the form of a DLL in their installation directory, to be used by applications or end users.

A user can indeed sign, say, a PDF document with a PEM key or an OpenSSL generated key but this signature cannot be traced back to a recognized authority, either international or national, and can't be fully trusted. This is an issue in professional transactions where so-called self-signed certificates are not accepted in most situations.

Tokens are a reasonably affordable alternative to store the cryptographic data (keys and others) and services to provide the necessary trust in online transactions and business data exchanges. However, they are not enough to ensure full interoperability of signatures (and other cryptographic services). The Cryptographic Message Syntax (CMS) was designed to provide this.

4) Cryptographic Message Syntax (CMS)

The CMS is used to digitally sign, digest, authenticate, or encrypt arbitrary message content, as RFC 5652 says in its introduction. We have apparently closed the loop and can now sign documents. However, as with most cryptographic standards, that would be too easy, and before we look at a PDF signature, we need to dive into what is called Abstract Syntax Notation version 1, or ASN.1, the nightmare of all crypto programmers in the universe.

5) ASN.1


ASN.1 is a language designed in the 80's for describing the content and the encoding of messages sent and received by a protocol, initially mostly for networking. As all languages, it has a grammar, fairly natural, and then specific encodings to travel over the networks or in files.

Here is an example of grammar. Let's say we want to describe contacts in a phone book, the record for a contact will look like:

Contact ::= SEQUENCE {
    name UTF8String,
    phone NumericString
}

SEQUENCE is a keyword, as well as UTF8String and NumericString. These are some of the types accepted by ASN.1.

This sequence cannot be stored as such in a certificate or a signature, so it has to be encoded independently from the system, computers and networks, that will process it. ASN.1 proposes many types of encodings: Basic Encoding Rules (BER), Distinguished Encoding Rules (DER), Canonical Encoding Rules (CER), Packed Encoding Rules (PER), Octet Encoding Rules (OER), XML Encoding Rules (XER), Extended XER (Exer) and JSON Encoding Rules (JER).

BER, DER and CER use what is called a Tag Length Value (TLV) syntax, where a T is a tag, a reserved keyword of the language, L is the length of the tag value and V is the actual value associated with the tag (a name or an assigned value).

If we want to store Niklas Wirth, with 00410521234567 in our phone book, the DER encoding will look like:
30 2A 0C 18 00 4E 00 69 00 6B 00 6C 00 61 00 73 00 20 00 57 00 69 00 72 00 74 00 68 0B 0E 30 30 34 31 30 35 32 31 32 33 34 35 36 37


The '30' hexadecimal value is the sequence tag, '2A' is the total size of the byte sequence, '0C' is the UTF8String type, etc.

A great feature of ASN.1 is the ability to register new objects that can then be given a value and then encoded in sequences. These objects can be, in our case, institutions (e.g., international organisations, national authorities, companies) or specific identifiers such as an algorithm name. Object Identifiers (OIDs) are assigned tag 06 that is then a reserved ASN.1 keyword hex value.

For instance, sequence 06 09 60 86 48 01 65 03 04 02 01 decodes into an OID of 9 bytes with value 60 86 48 01 65 03 04 02 01. This value is 'compressed' and decodes into 2.16.840.1.101.3.4.2.1, which is the OID for SHA-256. Most OIDs can be found, for instance, at http://oid-info.com/ or https://oidref.com/.

Going back to the topic, both certificates and signatures will be encoded in ASN.1, mostly in DER, sometimes with minor variants.

6) Decoding a certificate from a token

Before we can read the content of a certificate, we need to extract it from a token and decode it. We extract the binary data and turn it into hex code. It looks like this:

30 82 0A 21 06 09 2A 86  48 86 F7 0D 01 07 02 A0
82 0A 12 30 82 0A 0E 02  01 01 31 0F 30 0D 06 09
60 86 48 01 65 03 04 02  01 05 00 30 0B 06 09 2A
86 48 86 F7 0D 01 07 01  A0 82 07 D5 30 82 07 D1
...
FB E9 BB 93 61 65 30 0D  06 09 60 86 48 01 65 03
04 02 01
05 00 A0 4B 30  18 06 09 2A 86 48 86 F7
0D 01 09 03 31 0B 06 09  2A 86 48 86 F7 0D 01 07
01 30 2F 06 09 2A 86 48  86 F7 0D 01 09 04 31 22
04 20 60 4D C1 3A 32 6C  13 A0 22 09 C7 1D 9E 2A
C6 08 14 31 78 E2 80 EE  1E 5C 7C 2C 26 E4 82 B4
C7 5A 30 0D 06 09 2A 86  48 86 F7 0D 01 01 0B 05
00 04 82 01 00 97 47 4A  4B 6C 41 5D 1C 35 55 04
C7 26 1D ED 6A E0 E8 4C  14 C3 24 C0 E9 40 C1 AC
6D 76 A9 5D 33 8A 82 C5  E5 1B FC 9D DA 01 E4 0D
AE 56 59 BC B6 C1 CE 95  35 15 A7 68 2E B0 5F 38
50 D3 6C D8 22 DC CE 4D  9B 2C 1A 5A FD 81 C1 80
12 AB D2 95 60 05 3E 43  B9 FA 0A 60 3B D7 7A FC
9E 4C AE 27 13 C4 49 C3  83 D8 69 AF 8B 77 1B A6
40 66 F4 2D BE A4 5C BF  CA 50 92 C2 CB 7C FF C5
50 FE 13 C6 85 EA 32 08  91 B2 92 21 65 4F 6A A5
F3 20 E9 69 2B 45 AB 20  D1 24 FB C9 C0 E0 BD B3
43 F0 F5 7F DB B4 44 09  A4 CE 57 60 82 82 62 27
0B C2 AD F9 52 4A 8C 0D  B4 3A 4A 40 38 05 58 05
50 20 3E 29 C8 05 2D 8E  71 3F 44 7C 52 E7 76 D2
E2 40 BF E0 BA 92 D5 38  F3 85 FD B9 70 81 3B 61
51 AC F6 13 7C 07 2A 96  40 80 F9 A8 12 7F A8 0D
AE E0 4B 9D 58 14 83 CF  E2 0E 47 32 7E AF 8B EF
AC 2B 77 D7 72

It is quite long because it contains at least a signature by the PKI Root Certification Authority that uses a 4096-bit RSA key and a 2048-bit public key, among others. The exact size of the certificate is given by the 3 bytes following the initial 30 code (for sequence): 82 0A 21. 82 means that the size of the sequence is higher than 127 and coded on 2 bytes. The 2 bytes are 0A 21, which in decimal is 10 x 256 + 33 = 2593 bytes. If we take into account the 4 bytes used for the sequence and the size, the total size of the certificate is 2597 bytes.

Going to the next byte, we identify a 06 tag, which announces an OID of length 09 and value 2A 86 48 86 F7 0D 01 07 02, that decodes into 1.2.840.113549.1.7.2 for 'signed data' according to the PKCS#7 (CMS) format. We then know that the following sequence will terminate with a signature. We can verify this signature if we extract the proper public key and authenticate it through the entire signature chain of trust.

Decoding the ASN.1 DER gives (figure 1):

ContentInfo SEQUENCE (2 elem)
  contentType ContentType OBJECT IDENTIFIER 1.2.840.113549.1.7.2 signedData (PKCS #7)
  content [0] (1 elem)
    SignedData SEQUENCE (5 elem)
      version CMSVersion INTEGER 1
      digestAlgorithms DigestAlgorithmIdentifiers SET (1 elem)
        DigestAlgorithmIdentifier SEQUENCE (2 elem)
          algorithm OBJECT IDENTIFIER 2.16.840.1.101.3.4.2.1 sha-256 (NIST Algorithm)
          parameters ANY NULL
      encapContentInfo EncapsulatedContentInfo SEQUENCE (1 elem)
        eContentType ContentType OBJECT IDENTIFIER 1.2.840.113549.1.7.1 data (PKCS #7)
      CertificateSet [?] [0] (1 elem)
        CertificateChoices SEQUENCE (3 elem)
          tbsCertificate TBSCertificate SEQUENCE (8 elem)
            version [0] (1 elem)
              Version INTEGER 2
            serialNumber CertificateSerialNumber INTEGER (127 bit) 135267424677374787765554620810885554533
            signature AlgorithmIdentifier SEQUENCE (2 elem)
              algorithm OBJECT IDENTIFIER 1.2.840.113549.1.1.11 sha256WithRSAEncryption (PKCS #1)
              parameters ANY NULL
            issuer Name SEQUENCE (5 elem)
              RelativeDistinguishedName SET (1 elem)
                AttributeTypeAndValue SEQUENCE (2 elem)
                  type AttributeType OBJECT IDENTIFIER 2.5.4.6 countryName (X.520 DN component)
                  value AttributeValue [?] PrintableString FR
              RelativeDistinguishedName SET (1 elem)
                AttributeTypeAndValue SEQUENCE (2 elem)
                  type AttributeType OBJECT IDENTIFIER 2.5.4.10 organizationName (X.520 DN component)
                  value AttributeValue [?] UTF8String DHIMYOTIS
              RelativeDistinguishedName SET (1 elem)
                AttributeTypeAndValue SEQUENCE (2 elem)
                  type AttributeType OBJECT IDENTIFIER 2.5.4.11 organizationalUnitName (X.520 DN component)
                  value AttributeValue [?] UTF8String 0002 48146308100036
              RelativeDistinguishedName SET (1 elem)
                AttributeTypeAndValue SEQUENCE (2 elem)
                  type AttributeType OBJECT IDENTIFIER 2.5.4.97 organizationIdentifier (X.520 DN component)
                  value AttributeValue [?] UTF8String NTRFR-48146308100036
              RelativeDistinguishedName SET (1 elem)
                AttributeTypeAndValue SEQUENCE (2 elem)
                  type AttributeType OBJECT IDENTIFIER 2.5.4.3 commonName (X.520 DN component)
                  value AttributeValue [?] UTF8String Certigna Identity Plus CA
...
          signatureAlgorithm SignatureAlgorithmIdentifier SEQUENCE (2 elem)
            algorithm OBJECT IDENTIFIER 1.2.840.113549.1.1.11 sha256WithRSAEncryption (PKCS #1)
            parameters ANY NULL
          signature SignatureValue OCTET STRING (256 byte) 97474A4B6C415D1C355504C7261DED6AE0E84C14C324C0E940C1AC6D76A95D338A82C…


This certificate is basically a cryptographically signed identity card for electronic transactions.

7) Portable Document Format (PDF)

PDF is originally a file format published by Adobe. It is now an ISO standard [8] and its size, over a thousand pages, is an indication of its complexity.

A PDF document is a set of objects (text blocks, images, etc.) listed in a catalogue, in practice, indexed in a cross reference table. Section 12.8 of the ISO standard describes the signature object and states that "PDF 2.0 processors should support digital signatures based on the Cryptographic Message Syntax (CMS) and CAdES [6]". A signature is a specific type of object, labelled as such, encoded in ASN.1 DER and added to original document in the form of new entries in the catalogue.

To compute a signature, the signatory must first compute a digest over a "byte range" (the data of the original file, split to insert the signature objects), perform a series of operations to build the CMS block and sign certain elements of this block before it is finalised and added to the resulting PDF document.

Here is an excerpt signature block and CMS extension from a signed PDF.
<
8 0 obj
<<
/FT /Sig
/Type /Annot
/Subtype /Widget
/F 132
/T (Signature1)
/V 9 0 R
/P 4 0 R
/Rect [0.0 0.0 0.0 0.0]
>>
endobj
9 0 obj
<<
/Type /Sig
/Filter /Adobe.PPKLite
/SubFilter /ETSI.CAdES.detached
/Name (JOE BLOKE ID)
/M (D:20240212170638+00'00')
/Contents <30820AC306092a864886f70d010702a0820AB430820AB0020101310F300D06096086480165030402010500...00000000000000000000000>/ByteRange [0 55655 75657 564]
>>
endobj
10 0 obj
<<
/Length 19
/Info 3 0 R
/Root 1 0 R
/ID [<DC0636C5E21B5B88CFC4B813E5B68C5B><7D66433390840ADFCD8D499511EF70C8>]/Prev 54975 /Type /XRef
/Size 11 /Index [0 2 8 2]/W [1 2 0] /Filter /FlateDecode
>>
stream

endstream
endobj

We note the sequence starting with 30 right after the 'Contents' keyword.


The signature verification consists in performing the same procedure with cryptographic elements extracted from the CMS block, in particular the algorithm type (RSA, DSA or EC), key value and size, hash algorithm type (legacy SHA1, SHA2) and digest size. If a recipient wants to verify the full chain of trust, beyond the signature, other elements of the CMS block need to be decoded and verified. Because of the many possible options that can be used in the certificate and then the CMS block, decoding all OIDs and their values can be a real challenge and lead to errors if not thoroughly performed.

8) Conclusion

We have provided a simplified overview of the process and standards used to electronically sign PDF documents with cryptographic tokens, also refered to as PDF Advanced Electronic Signature (PAdES). TMSCP caters for PAdES, CAdES and XAdES (for XML files) and supports most tokens implementing the standards identified in the post.

However, because of the increasing number of options and subsequent OIDs in tokens and signatures, TMSCP users may experience technical issues in the use of their tokens. That's why we have issued a test application for PKCS#11 tokens. This executable has 3 menus (decode certificate from a token, extract and dump certificate, parse a certificate from a dump) and provides information on the content. It can be downloaded at X509DecoderV1_4.zip and errors can be sent to TMS via the support page for TMS CP or by email to the author.


P.S.: some anti-virus software flag the test programme as potentially dangerous. This is indeed not the case and due to the fact that cryptographic algorithms are sometimes associated to ransomware.

The SHA256 digest of the ZIP file is: 029AF3118B2B650106C5D68084A24E1336C2E7F0FCB7BEC00A963DEEBCEBCD4D

The SHA256 digest of the EXE file is: F66E430C720BDC4B04BF4FD719F9D8627D69D3002F7177AD3319D47A08750DBB

[1] https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=uriserv%3AOJ.L_.2014.257.01.0073.01.ENG5]
[2] ITU T-REC X.509, Information technology - Open Systems Interconnection - The Directory: Public-key and attribute certificate frameworks
[3] https://www.oasis-open.org/committees/download.php/71450/pkcs11-spec-v3.2-wd02%28markup%29.pdf
[4] Cryptographic Message Syntax (CMS), RFC 5652, https://datatracker.ietf.org/doc/html/rfc5652
[5] ITU-T X.680, OSI networking and system aspects – Abstract Syntax Notation One (ASN.1), https://www.itu.int/ITU-T/studygroups/com17/languages/X.680-0207.pdf
[6] ETSI EN 319 122, https://www.etsi.org/deliver/etsi_en/319100_319199/31912201/01.02.01_60/en_31912201v010201p.pdf
[7] ETSI EN 319 141-1 and 141-2, Electronic Signatures and Infrastructures (ESI), PAdES digital signatures
[8] ISO 32000-2:2020, Document Management, Portable Document Format (PDF)
[9] PKCS #12: Personal Information Exchange Syntax v1.1, RFC 7292, https://datatracker.ietf.org/doc/html/rfc7292
[10] Textual Encodings of PKIX, PKCS, and CMS Structures, RFC 7468, https://datatracker.ietf.org/doc/html/rfc7468



Bernard




This blog post has not received any comments yet.



Add a new comment

You will receive a confirmation mail with a link to validate your comment, please use a valid email address.
All fields are required.



All Blog Posts  |  Next Post  |  Previous Post