How secure internet connections work is often a mystery, even for fairly
technical people – let's rectify this!
Although some fairly complicated mathematics lays the foundation, it's not
necessary to grasp all of it to get a good high-level understanding of the
secure web. In this post, I'll give an overview of the major components of
the secure web and of how they interact. I want to shed some light on the
wizardry that both your browser and webservers do to make secure communication
over the internet possible. Specifically, the focus is on authentication: how
the public key infrastructure (PKI) guarantees you that you are really
talking to your bank not to some scammer.
This is going to be a high-level overview with a lot of handwaving. We'll gloss
over some of the nitty gritty details, and focus on the big picture.
Keypairs
Let's start off with the basic building block of public-key
cryptography, the public/private keypair. It consists of
two halves: the public part can be shared with the world, the private half must
stay hidden, for your eyes only. With such a keypair, you can do three pretty
nifty things:
- People can encrypt data with your public key, and only you can decrypt it.
- You can prove to other people (who have your public key) that you know the
private key. In some sense, they can verify your identity.
- You can sign data with your private key, and everyone with the public key
can verify that this data came from you and was not tempered with.
Let's look at an example to see how cool this is: let's say your bank has a
keypair and you know their public key (it could be printed in large letters on
the walls of their building, it's public after all). You can then securely
communicate with your bank, without anyone being able to listen in, even if
they can intercept or modify the messages between you and the bank
(encryption). You can be sure you really are talking to the bank, and not to a
fraudster (verification). And the bank can make statements that anyone who has
their public key can ascertain is genuine, i.e. it really came from the bank
(signing).
The last part is nice because the bank can sign a statement about your balance,
give it to you, and you can forward it to your sleazy landlord who wants proof
of your financial situation. The bank and the landlord never directly talk
with each other, nevertheless the latter has full certainty that the statement
was made by the bank, and that you didn't tamper with it.
So it's cool that we can securely communicate with our banks. We can do the
same with websites: once we have the public key of e.g. Google, it's easy to
setup an encrypted communication channel. Via the verification function of
keypairs, it's also easy to prove we really are talking to that Google, and
not to some kid who's trying to steal our password to post dog pictures in our
cat groups.
How do we get Google's public key? — This is where things start going
downhill.
In the bank example, we'd gotten the public key personally from the bank
(written on its front wall). With Google, it'd be kind of difficult to
travel to Mountain View just to get get their public key. And we can't just go and
download the key from google.com, the whole point is that we're not sure
that the google.com we're talking to is the real Google.
Are we completely out of luck? Can we communicate securely over the internet
only if we manually exchange keys before, which we usually can't? It turns out
we are only sort-of out of luck: we are stuck with certificates and the
halfway-broken system of certificate authorities.
Certificates
A certificate contains several parts:
- an identifier (a hash) that uniquely identifies a keypair
- metadata that says who this keypair belongs to
- signatures: statements signed by other keys that vouch that the keypair
referenced here really belongs to the entity described in the metadata section
It's important to note that certificates don't need to be kept secret. The
keypair identifier doesn't reveal the private key, so certificates can be
shared freely. The corollary of this is that a certificate alone can't be used
to verify you're talking to anyone in particular. To be used for
authentification, it needs to be paired with the associated private key.
With that out of the way, let's look at why certificates are useful. Say
someone gives you a certificate and proves they have the associated private
key. You've never met this person. However, the certificate carries signatures
from several keys that you know belong to close friends of yours. All of those
signatures attest that this person is called "Hari Seldon". If you trust your
friends, you can be pretty certain that the person is really called that way.
When you think about this, it's kind of neat. A stranger can authenticate to
you (prove that they say who they are) just because someone you trust made a
statement confirming the stranger's identity. That this statement is really
coming from your trusted friend is ensured, because it's signed with their
private key.
The same concept can be applied websites. As long as there's someone you trust
and you have their public key, they can sign other people's certificates to
affirm that identity to you. For example, they can sign a certificate for
Google that says "This really is the real google.com". When you see that
certificate and verify that the other party has the associated private key,
you'll have good reason to believe that you really are talking to Google's
google.com server, not some scam version by a North Korean hacker.
Certificate Authorities
So how do you find someone you can trust? And how does that person make sure
that the certificate they are signing really belongs to Google? They face the
same problems confirming that fact as you did! Does this even improve the
situation in any way?
It does – let's take the questions in order. The reality on the internet is:
it's not you trusting someone, it's your browser that does the trusting.
Your browser includes public keys from so-called "certificate
authorities" (CA's). You can find the list of CA's trusted by your own
browser in its options, under Advanced / Security / Certificates / Authorities.
If the browser sees certificates signed by any one of these keys, it believes
them to be true. It trusts CA's not to sign any bogus certificates.
Why are these keys trustworthy? Because CA's are mostly operated by large
companies that have strict policies in place to make sure they only sign stuff
that's legit. How do they do that? After all, as an individual you'd have a
pretty tough time verifying that the public key offered by google.com
really belongs to Google. Don't CA's face the same problem?
Not really. There are billions of people accessing google.com. There are
only about 200 CA's that are trusted by the common browsers. And Google needs
a signed certificate by only one of them (one signature is enough to earn the
browser's trust). So Google can afford to prove it's identity to a CA: by
sending written letters, a team of lawyers, or whatever. Once Google gets a
certificate for google.com signed by any reputable CA, it is recognized by
pretty much every device in the world.
Similarly, I, as a private person, can get a certificate for caichinger.com by
proving my identity and my ownership of this domain to the CA. The identity
part is usually done by submitting a scan of a driver's license or passport.
Ownership of the domain can be shown by uploading a file supplied by the CA to
the webserver. Once the CA confirms that the file is there, it knows I have
control of that domain.
So instead of me having to prove my identity to every single user visiting this
website, I can prove it once to a CA, and all browsers coming here will
recognize this as good enough. This way, CA's make the problem of
authentication of servers ("I'm the real google.com, not a cheap fake")
tractable. It's a system that has made the large-scale deployment of secure
internet traffic via HTTPS possible.
The half-broken part
Let's get back to the analogy of a stranger authenticating to you via a
certificate signed by someone you know. What if the signature wasn't from a
close friend of yours, but from a seedy guy you meet occasionally when going
out? Would you still have full confidence in the certificate? Hopefully not.
What does this mean for the web?
Not all of the CA's included in the common web browsers are the equivalent of a
trusted friend:
- They may be in control of some government who wants a certificate for gmail.com, so it can read dissident's emails
- An employee with access to the certificate authority key may create certificates for bank websites and sell them on the black market
- The computer network where the CA keys are stored could have been hacked
I'm pretty sure all three of those have actually happened in the past. Given
that a single forged certificate can be used to attack millions of users, CA's
are juicy targets. As soon as forged certificates are detected in the wild,
they tend to be blacklisted (blocked by browsers) very quickly, but there is
still a window of vulnerability.
For this reason, the whole CA system has been questioned over the last few
years, but replacing it does not seem feasible at the moment. There are
techniques (such as public key pinning) to augment the CA-based
authentication, but it takes time for them to be picked up by website owners.
While this is a problem, it mostly affects the largest websites (obtaining a
forged certificate is difficult and costly). Together with browser vendors,
they are developing new mitigation techniques against forged certificates. In
the meantime, the rest of us is still pretty well served by the current CA
system, even though it is not perfect.
Wrapup
So, this is it for an overview of the public key infrastructure that enables
secure connections to internet sites, from the basics of public key
cryptography to certificate authorities. If you want to dig deeper, I recommend
starting with the Wikipedia articles I linked throughout the article. If you
are interested in cryptography in general, I highly recommend Bruce Schneier's
book Applied Cryptography. It's 20 years old now, and still
enormously relevant today.
I hope this text helps a bit to clear up the confusion associated with public
key cryptography and the secure web. If you liked it, or if you have any
suggestions for improvement, please let me know in the comments!