转载地址:http://www.cryptologie.net/article/262/what-are-x509-certificates-rfc-asn1-der/
What are x509 certificates? RFC? ASN.1? DER? April 2015
RFC
So, RFC means Request For Comments and they are a bunch of text files that describe different protocols. If you want to understand how SSL, TLS (the new SSL) and x509 certificates (the certificates used for SSL and TLS) all work, for example you want to code your own OpenSSL, then you will have to read the corresponding RFC for TLS: rfc5280 forx509 certificates and rfc5246 for the last version of TLS (1.2).
x509
x509 is the name for certificates which are defined for:
informal internet electronic mail, IPsec, and WWW applications
There used to be a version 1, and then a version 2. But now we use the version 3. Reading the corresponding RFC you will be able to read such structures:
Certificate ::= SEQUENCE {
tbsCertificate TBSCertificate,
signatureAlgorithm AlgorithmIdentifier,
signatureValue BIT STRING }
those are ASN.1 structures. This is actually what a certificate should look like, it's a SEQUENCE of objects.
- The first object contains everything of interest that will be signed, that's why we call it a To Be Signed Certificate
- The second object contains the type of signature the CA used to sign this certificate (ex: sha256)
- The last object is not an object, its just some bits that correspond to the signature of the TBSCertificate after it has been encoded with DER
ASN.1
It looks small, but each object has some depth to it.
The TBSCertificate is the biggest one, containing a bunch of information about the client, the CA, the publickey of the client, etc...
TBSCertificate ::= SEQUENCE {
version [0] EXPLICIT Version DEFAULT v1,
serialNumber CertificateSerialNumber,
signature AlgorithmIdentifier,
issuer Name,
validity Validity,
subject Name,
subjectPublicKeyInfo SubjectPublicKeyInfo,
issuerUniqueID [1] IMPLICIT UniqueIdentifier OPTIONAL,
-- If present, version MUST be v2 or v3
subjectUniqueID [2] IMPLICIT UniqueIdentifier OPTIONAL,
-- If present, version MUST be v2 or v3
extensions [3] EXPLICIT Extensions OPTIONAL
-- If present, version MUST be v3
}
DER
A certificate is of course not sent like this. We use DER to encode this in a binary format.
Every fieldname is ignored, meaning that if we don't know how the certificate was formed, it will be impossible for us to understand what each value means.
Every value is encoded as a TLV triplet: [TAG, LENGTH, VALUE]
For example you can check the GITHUB certificate here
On the right is the hexdump of the DER encoded certificate, on the left is its translation in ASN.1 format.
As you can see, without the RFC near by we don't really know what each value corresponds to. For completeness here's the same certificate parsed by openssl x509
command tool:
How to read the DER encoded certificate
So go back and check the hexdump of the GITHUB certificate, here is the beginning:
30 82 05 E0 30 82 04 C8 A0 03 02 01 02
As we saw in the RFC for x509 certificates, we start with a SEQUENCE.
Certificate ::= SEQUENCE {
Microsoft made a documentation that explains pretty well how each ASN.1 TAG is encoded in DER, here's the page on SEQUENCE
30 82 05 E0
So 30
means SEQUENCE. Since we have a huge sequence (more than 127 bytes) we can't code the length on the one byte that follows:
If it is more than 127 bytes, bit 7 of the Length field is set to 1 and bits 6 through 0 specify the number of additional bytes used to identify the content length.
(in their documentation the least significant bit on the far right is bit zero)
So the following byte 82
, converted in binary: 1000 0010
, tells us that the length of the SEQUENCE will be written in the following 2 bytes 05 E0
(1504 bytes)
We can keep reading:
30 82 04 C8 A0 03 02 01 02
Another Sequence embedded in the first one, the TBSCertificate SEQUENCE
TBSCertificate ::= SEQUENCE {
version [0] EXPLICIT Version DEFAULT v1,
The first value should be the version of the certificate:
A0 03
Now this is a different kind of TAG, there are 4 classes of TAGs in ASN.1: UNIVERSAL,APPICATION, PRIVATE, and context-specific. Most of what we use are UNIVERSAL tags, they can be understood by any application that knows ASN.1. The A0
is the [0]
(and the following 03
is the length). [0]
is a context specific TAG and is used as an index when you have a series of object. The github certificate is a good example of this, because you can see that the next index used is [3]
the extensions object:
TBSCertificate ::= SEQUENCE {
version [0] EXPLICIT Version DEFAULT v1,
serialNumber CertificateSerialNumber,
signature AlgorithmIdentifier,
issuer Name,
validity Validity,
subject Name,
subjectPublicKeyInfo SubjectPublicKeyInfo,
issuerUniqueID [1] IMPLICIT UniqueIdentifier OPTIONAL,
-- If present, version MUST be v2 or v3
subjectUniqueID [2] IMPLICIT UniqueIdentifier OPTIONAL,
-- If present, version MUST be v2 or v3
extensions [3] EXPLICIT Extensions OPTIONAL
-- If present, version MUST be v3
}
Since those obects are all optionals, skipping some without properly indexing them would have caused trouble parsing the certificate.
Following next is:
02 01 02
Here's how it reads:
_______ tag: integer
| ____ length: 1 byte
| | _ value: 2
| | |
| | |
v v v
02 01 02
The rest is pretty straight forward except for IOD: Object Identifier.
Object Identifiers
They are basically strings of integers that reads from left to right like a tree.
So in our Github's cert example, we can see the first IOD is 1.2.840.113549.1.1.11
and it is supposed to represent the signature algorithm.
So go to http://www.alvestrand.no/objectid/top.html and click on 1
, and then 1.2
, and then 1.2.840
, etc... until you get down to the latest branch of our tree and you will end up on sha256WithRSAEncryption.
Here's a more detailed explanation on IOD and here's the microsoft doc on how to encode IOD in DER.
Jide Akinyemi
Thanks
Mamoon Ahmed
David, This is the best article on the internet to further explain the missing concepts of ASN.1 encoding. You saved alot of my time. Thank you so much and keep up the good work bro ....
Mikaz
This was useful for me today. Good job !