Thursday, November 29, 2018

Certificate Transparency

Today is 2018's World Digital Preservation Day. It might appear that this post has little to do with digital preservation. However, I hope that the long post below the fold is the start of a series asking the simple question underlying not just digital preservation, but many areas of the digital world, "how do I know this digital content is real?" It is another of the many problems for which blockchain is touted as a solution by people lacking real-world understanding of either the problem or the technology, or both.

In this post I'm going to look in some detail at Certificate Transparency, an important initiative aimed at improving security and authenticity on the Web, and relate the techniques it uses to those underlying the LOCKSS system. In the next post I plan to ask how these techniques could be applied to other areas in which authenticity is important, such as in the software supply chain.

Certificate Transparency

How do I know that I'm talking to the right Web site? Because there's a closed padlock icon in the URL bar, right? Mozilla says:
A green padlock (with or without an organization name) indicates that:
  • You are definitely connected to the website whose address is shown in the address bar; the connection has not been intercepted.
  • The connection between Firefox and the website is encrypted to prevent eavesdropping.
The padlock icon appears when the browser has validated that the connection to the URL in the URL bar supplied a certificate for the site in question carrying a signature chain ending in one of the root certificates the browser trusts. Browsers come with a default list of root certificates from Certificate Authorities (CAs). Two years ago in The Curious Case of the Outsourced CA I wrote about this list:
The list of CAs that my Firefox trusts is here, all 188 of them. Note that because GeoTrust is in the list, it can certify its own website. I wrote about the issues around CAs back in 2013, notably that:
  • The browser trusts all of them equally.
  • The browser trusts CAs that the CAs on the list delegate trust to. Back in 2010, the EFF found more than 650 organizations that Internet Explorer and Firefox trusted.
  • Commercial CAs on the list, and CAs they delegate to, have regularly been found to be issuing false or insecure certificates.
Among the CAs on the list are agencies of many governments, such as the Dutch, Chinese, Hong Kong, and Japanese governments.
My current Firefox browser trusts 305 root certificates from 79 unique organizations. One of these organizations is the Internet Security Research Group, a not-for-profit organization hosted by the Linux Foundation and sponsored by many organizations including Mozilla and the EFF, which has greatly improved the information hygiene of the Web through a program called Let's Encrypt. This has provided over 115 million Web sites with free certificates carrying a signature chain rooted in a certificate that almost all browsers trust. This blog's certificate is one of them, as you can see by clicking on the padlock icon.

Some of the organizations whose root certificates my browser trusts are known to have abused this trust. For example, Dan Goodin writes in One-stop counterfeit certificate shops for all your malware-signing needs:
Barysevich identified four such sellers of counterfeit certificates since 2011. Two of them remain in business today. The sellers offered a variety of options. In 2014, one provider calling himself C@T advertised certificates that used a Microsoft technology known as Authenticode for signing executable files and programming scripts that can install software. C@T offered code-signing certificates for macOS apps as well. ... "In his advertisement, C@T explained that the certificates are registered under legitimate corporations and issued by Comodo, Thawte, and Symantec—the largest and most respected issuers,"
Abuse of the trust users place in CAs is routine:
Over the past few years there have been numerous instances of misissued certificates being used to spoof legitimate sites, and, in some case, install malicious software or spy on unsuspecting users.

In one case, a prominent Dutch CA (DigiNotar) was compromised and the hackers were able to use the CA’s system to issue fake SSL certificates. The certificates were used to impersonate numerous sites in Iran, such as Gmail and Facebook, which enabled the operators of the fake sites to spy on unsuspecting site users. In another case, a Malaysian subordinate certificate authority (DigiCert Sdn. Bhd.), mistakenly issued 22 weak SSL certificates, which could be used to impersonate websites and sign malicious software. As a result, major browsers had to revoke their trust in all certificates issued by DigiCert Sdn. Bhd. (Note: DigiCert Sdn. Bhd. is not affiliated with the U.S.-based corporation DigiCert, Inc.)

More recently, a large U.S.-based CA (TrustWave) admitted that it issued subordinate root certificates to one of its customers so the customer could monitor traffic on their internal network. Subordinate root certificates can be used to create SSL certificates for nearly any domain on the Internet. Although Trustwave has revoked the certificate and stated that it will no longer issue subordinate root certificates to customers, it illustrates just how easy it is for CAs to make missteps and just how severe the consequences of those missteps might be.
Just yesterday, there was another example:
The issue with the two HeadSetup apps came to light earlier this year when German cyber-security firm Secorvo found that versions 7.3, 7.4, and 8.0 installed two root Certification Authority (CA) certificates into the Windows Trusted Root Certificate Store of users' computers but also included the private keys for all in the SennComCCKey.pem file.

In a report published today, Secorvo researchers published proof-of-concept code showing how trivial would be for an attacker to analyze the installers for both apps and extract the private keys.

Making matters worse, the certificates are also installed for Mac users, via HeadSetup macOS app versions, and they aren't removed from the operating system's Trusted Root Certificate Store during current HeadSetup updates or uninstall operations.
Sennheiser's snafu, tracked as CVE-2018-17612, is not the first of its kind. In 2015, Lenovo shipped laptops with a certificate that exposed its private key in a scandal that became known as Superfish. Dell did the exact same thing in 2016 in a similarly bad security incident that became known as eDellRoot.
This type of “mistake” allows attackers to impersonate any Web site to affected devices. CAs are supposed to issue three grades of certificate based on increasingly rigorous validation:
  • Domain Validated (DV) certificates verify control over the DNS entries, email and Web content of the specified domain. They can be issued via automated processes, as with Let's Encrypt.
  • Organization Validated (OV) certificates are supposed to verify the legal entity behind the DV-level control of the domain, but in practice are treated the same as DV certificates.
  • Extended Validation (EV) certificates require "verification of the requesting entity's identity by a certificate authority (CA)". Verification is supposed to be an intrusive, human process.
But, as can be seen from the advert, the Extended Verification process is far from fool-proof. This lack of trustworthiness of CAs should not be a surprise. Four years ago Security Collapse in the HTTPS Market, a fascinating analysis of the (lack of) security on the Web from an economic rather than a technical perspective by Axel Arnbak et al from Amsterdam and Delft Universities showed that CAs lack incentives to be trustworthy. Arnbak et al write:
A crucial technical property of the HTTPS authentication model is that any CA can sign certificates for any domain name. In other words, literally anyone can request a certificate for a Google domain at any CA anywhere in the world, even when Google itself has contracted one particular CA to sign its certificate.
This "technical property" is actually important, it is what enables a competitive market of CAs. But Symantec in particular has exploited it wholesale:
Google's investigation revealed that over a span of years, Symantec CAs have improperly issued more than 30,000 certificates. Such mis-issued certificates represent a potentially critical threat to virtually the entire Internet population because they make it possible for the holders to cryptographically impersonate the affected sites and monitor communications sent to and from the legitimate servers. They are a major violation of the so-called baseline requirements that major browser makers impose of CAs as a condition of being trusted by major browsers.
But Symantec has suffered no effective sanctions because they are too big to fail:
Symantec's repeated violations underscore one of the problems Google and others have in enforcing terms of the baseline requirements. When violations are carried out by issuers with a big enough market share they're considered too big to fail. If Google were to nullify all of the Symantec-issued certificates overnight, it might cause widespread outages.
My Firefox still trusts Symantec root certificates. Because Google, Mozilla and others prioritize keeping the Web working over keeping it secure, deleting misbehaving big CAs from trust lists won't happen. When Mozilla writes:
You are definitely connected to the website whose address is shown in the address bar; the connection has not been intercepted.
they are assuming a world of honest CAs that isn't this world. If you have the locked padlock icon in your URL bar, you are probably talking to the right Web site, but there is a chance you aren't. Even if you are talking to the domain in the URL, Brian Krebs reports that:
Recent data from anti-phishing company PhishLabs shows that 49 percent of all phishing sites in the third quarter of 2018 bore the padlock security icon next to the phishing site domain name as displayed in a browser address bar. That’s up from 25 percent just one year ago, and from 35 percent in the second quarter of 2018.
Efforts to reduce the chance that you aren't have been under way for a long time. In 2008 Wendlandt et al published Perspectives: Improving SSH-style Host Authentication with Multi-Path Probing, describing a system that:
thwarts many of these attacks by using a collection of “notary” hosts that observes a server’s public key via multiple network vantage points (detecting localized attacks) and keeps a record of the server’s key over time (recognizing short-lived attacks). Clients can download these records on-demand and compare them against an unauthenticated key,
This is the basic idea that underlay the early efforts. Except for rare cases, such as a certificate being replaced after compromise or expiration, every time a service is accessed the client should receive the same certificate. Clients can consult a service that collects the (hash of) certificates as they are received at locations all over the Internet to detect the use of fraudulent certificates. Five years ago in Trust in Computer Systems I reviewed the state of the art in implementing this concept, including Moxie Marlinspike's 2011 Convergence, a distributed approach, and the EFF's SSL Observatory, a centralized approach. Note that both approaches are implemented without participation by certificate authorities or owners, and neither has achieved widespread adoption.

In 2012 Google started work on an approach that requires participation by certificate owners, specified in RFC6962, and called Certificate Transparency (CT):
Google's Certificate Transparency project fixes several structural flaws in the SSL certificate system, which is the main cryptographic system that underlies all HTTPS connections. These flaws weaken the reliability and effectiveness of encrypted Internet connections and can compromise critical TLS/SSL mechanisms, including domain validation, end-to-end encryption, and the chains of trust set up by certificate authorities. If left unchecked, these flaws can facilitate a wide range of security attacks, such as website spoofing, server impersonation, and man-in-the-middle attacks.

Certificate Transparency helps eliminate these flaws by providing an open framework for monitoring and auditing SSL certificates in nearly real time. Specifically, Certificate Transparency makes it possible to detect SSL certificates that have been mistakenly issued by a certificate authority or maliciously acquired from an otherwise unimpeachable certificate authority. It also makes it possible to identify certificate authorities that have gone rogue and are maliciously issuing certificates.
The basic idea is to accompany the certificate with a hash of the certificate signed by a trusted third party, attesting that the certificate holder told the third party that the certificate with that hash was current. Thus in order to spoof a service, an attacker would have to both obtain a fraudulent certificate from a CA, and somehow persuade the third party to sign a statement that the service had told them the fraudulent certificate was current. Clearly this is:
  • more secure than the current situation, which requires only compromising a CA, and:
  • more effective than client-only approaches, which can detect that a certificate has changed but not whether the change was authorized.
CT also requires participation from browser manufacturers:
In order to improve the security of Extended Validation (EV) certificates, Google Chrome requires Certificate Transparency (CT) compliance for all EV certificates issued after 1 Jan 2015.
Clients now need two lists of trusted third parties, the CAs and the sources of CT attestations. The need for these trusted third parties is where the blockchain enthusiasts would jump in and claim (falsely) that using a blockchain would eliminate the need for trust. But CT has a much more sophisticated approach, Ronald Reagan's "Trust, but Verify". In the real world it isn't feasible to solve the problem of untrustworthy CAs by eliminating the need for trust. CT's approach instead is to provide a mechanism by which breaches of trust, both by the CAs and by the attestors, can be rapidly and unambiguously detected. The CT team write:
One of the problems is that there is currently no easy or effective way to audit or monitor SSL certificates in real time, so when these missteps happen (malicious or otherwise), the suspect certificates aren’t usually detected and revoked for weeks or even months.
I don't want to go into too much detail, for that you can read RFC6962, but here is an overview of how CT works to detect breaches of trust. The system has the following components:
  • Logs, to which CAs report their current certificates, and from which they obtain attestations, called Signed Certificate Timestamps (SCTs) that owners can attach to their certificates. Clients can verify the signature on the SCT, then verify that the hash it contains matches the certificate. If it does, the certificate was the one that the CA reported to the log, and the owner validated. It is envisaged that there will be tens but not thousands of logs; Chrome currently trusts 26 logs. Each log maintains a Merkle tree data structure of the certificates for which it has issued SCTs.
  • Monitors, which periodically download all newly added entries from the logs that they monitor, verify that they have in fact been added to the log, and perform a series of validity checks on them. They also thus act as backups for the logs they monitor.
  • Auditors, which use the Merkle tree of the logs they audit to verify that certificates have been correctly appended to the log, and that no retroactive insertions, deletions or modifications of the certificates in the log have taken place. Clients can use auditors to determine whether a certificate appears in a log. If it doesn't, they can use the SCT to prove that the log misbehaved.
In this way, auditors, monitors and clients cooperate to verify the correct operation of logs, which in turn provides clients with confidence in the [certificate,attestation] pairs they use to secure their communications.

Figure 3 of How Certificate Transparency Works shows how the team believes it would normally be configured. As you can see, the monitor is actually part of the CA, checking at intervals that the log contains only the certificates the CA sent it. The auditor is part of the client, checking at intervals that the certificates in the SCTs it receives from Web servers are correctly stored in the log.

Although CT works if certificate owners each obtain their SCTs from only one log, RFC6962 recommends that:
TLS servers should send SCTs from multiple logs in case one or more logs are not acceptable to the client (for example, if a log has been struck off for misbehavior or has had a key compromise).
In other words, each certificate should be submitted to multiple logs. This is only one of the features of CT that are important for resilience:
  1. Each log operates independently.
  2. Each log gets its content directly from the CAs, not via replication from other logs.
  3. Each log contains a subset of the total information content of the system.
  4. There is no consensus mechanism operating between the logs, so it cannot be abused by, for example, a 51% attack.
  5. Monitoring and auditing is asynchronous to Web content delivery, so denial of service against the monitors and auditors cannot prevent clients obtaining service. Sustained over long periods it would gradually decrease clients' confidence in the Web sites' certificates.
Looking at the list of logs Chrome currently trusts, it is clear that almost all are operated by CAs themselves. Assuming that each monitor at each CA is monitoring some of the other logs as well as the one it operates, this does not represent a threat, because misbehavior by that CA would be detected by other CAs. A CA's monitor that was tempted to cover up misbehavior by a different CA's log it was monitoring would risk being "named and shamed" by some other CA monitoring the same log, just as the misbehaving CA would be "named and shamed".

It is important to observe that, despite the fact that CAs operate the majority of the CT infrastructure, its effectiveness in disciplining CAs is not impaired. Arnbak et al write:
  • Information asymmetry prevents buyers from knowing what CAs are really doing. Buyers are paying for the perception of security, a liability shield, and trust signals to third parties. None of these correlates verifiably with actual security. Given that CA security is largely unobservable, buyers’ demands for security do not necessarily translate into strong security incentives for CAs.
  • Negative externalities of the weakest-link security of the system exacerbate these incentive problems. The failure of a single CA impacts the whole ecosystem, not just that CA’s customers. All other things being equal, these interdependencies undermine the incentives of CAs to invest, as the security of their customers depends on the efforts of all other CAs.
the market for SSL certificates is highly concentrated, despite the large number of issuers. In fact, both data sets find that around 75 percent of SSL certificates in use on the public Web have been issued by just three companies: Symantec, GoDaddy, and Comodo.
Let's Encrypt may have had some effect on these numbers, which are from 2014.

All three major CAs have suffered reputational damage from recent security failures, although because they are "too big to fail" this hasn't impacted their business much. However, as whales in a large school of minnows it is in their interest to impose costs (for implementing CT) and penalties (for security lapses) on the minnows. Note that Google was sufficiently annoyed with Symantec's persistent lack of security that it set up its own CA. The threat that their business could be taken away by the tech oligopoly is real, and cooperating with Google may have been the least bad choice.

1-yr Bitcoin "price"
Because these major corporations have an incentive to pay for the CT infrastructure, it is sustainable in a way that a market of separate businesses, or a permissionless blockchain supported by speculation in a cryptocurrency would not be.

Fundamentally, if applications such as CT attempt to provide absolute security they are doomed to fail, and their failures will be abrupt and complete. It is more important to provide the highest level of security compatible with resilience, so that the inevitable failures are contained and manageable. This is one of the reasons why permissionless blockchains, subject to 51% attacks, and permissioned blockchains, with a single, central point of failure, are not suitable.


Archives care about the authenticity of the content they collect, preserve and disseminate. From the start of the LOCKSS Program in 1998 we realized that:
The LOCKSS system preserves e-journals that have intrinsic value and contain information that powerful interests might want changed or suppressed.
To a much greater extent now than when we started, nodes in a LOCKSS network (LOCKSS boxes) collect and disseminate content via HTTPS, so CT is an important advance in archival authenticity. Five years ago I compared the problem of verifying certificates to the problem of verifying content in a digital preservation system such as LOCKSS:
A bad guy subverts an SSL connection from a browser to a website by intercepting the connection and supplying a valid certificate that is different from the one that actually certifies the website. For example, it might be signed by the Dept. of Homeland Security instead of by Thawte. The browser trusts both. Thus the question is: "how to detect that the certificate is not the one it should be?" This is a similar question to the digital preservation question: "how to detect that the content is not what it should be?" On a second or subsequent visit the browser can compare the certificate with the one it saw earlier, but this doesn't help the first time and it isn't conclusive. There are valid reasons why a certificate can change, for example to replace one that has expired.
From the start in 1998 the LOCKSS system used similar resilience techniques to CT, for the same reason. Each journal article was stored in some, but probably not all of the boxes, and:
  1. Each box operates independently.
  2. Each box gets its content directly from the publisher, not via replication from other boxes.
  3. Each box contains a subset of the total information content of the system. 
  4. Unlike CT, there is a consensus mechanism operating between the boxes. But, unlike blockchains, it is not designed to enforce lockstep agreement among the boxes. It allows a box to discover whether its content matches the consensus of a sample of the other boxes. This is similar to the way that CT's monitors and auditors examine subsets of the logs to detect misbehavior.
  5. Monitoring and auditing is performed by the boxes themselves, but is asynchronous to Web content delivery, so denial of service against the boxes cannot prevent clients obtaining service. Sustained over long periods it would gradually decrease clients' confidence in the preserved Web content.
A decade later, Wendlandt et al's Perspectives: Improving SSH-style Host Authentication with Multi-Path Probing applied similar "voting among a sample" techniques to the certificate problem:
Our system, PERSPECTIVES, thwarts many of these attacks by using a collection of “notary” hosts that observes a server’s public key via multiple network vantage points (detecting localized attacks) and keeps a record of the server’s key over time (recognizing short-lived attacks). Clients can download these records on-demand and compare them against an unauthenticated key, detecting many common attacks.
It is important to protect against compromised or malign notaries, and this is where LOCKSS-style "voting among a sample" comes in:
The final aspect of the notary design is data redundancy, a cross-validation mechanism that limits the power of a compromised or otherwise malicious notary server. To implement data redundancy each notary acts as a shadow server for several other notaries. As described below, a shadow server stores an immutable record of each observation made by another notary. Whenever a client receives a query reply from a notary, the client also checks with one or more of that notary’s shadow servers to make sure that the notary reply is consistent with the history stored by the shadow server.
Just as in the LOCKSS system, if there are enough notaries and enough shadow servers the bad guy can't know which notaries the client will ask, and which shadow servers the client will ask to vote on the replies.


  1. Tim Anderson's Google to bury indicator for Extended Validation certs in Chrome because users barely took notice reports that, as Troy Hunt tweeted:

    "And that’s that - for all intents and purposes, EV is now dead: “the Chrome Security UX team has determined that the EV UI does not protect users as intended”"

    There are two reasons for its demise. The one Anderson points to is:

    "The team have concluded that positive security indicators are largely ineffective. The direction for Chrome will be to highlight negative indicators like unencrypted (HTTP) connections, which are marked as "not secure", rather than emphasise when a connection is secure."

    The other one is that, because Certificate Authorities have a long history of abusing their trust, users are in fact right to place little reliance on Extended Validation. As Anderson writes:

    "Google's announcement will make it harder for certificate providers to market EV certificates. This is also another reason why you might just as well use free Let’s Encrypt certificates – no EV from Let's Encrypt, but it no longer matters."

    The CAs bought this on themselves, and it couldn't happen to a better bunch of crooks. And there's more! Shaun Nichols reports that Web body mulls halving HTTPS cert lifetimes. That screaming in the distance is HTTPS cert sellers fearing orgs will bail for Let's Encrypt:

    "CA/Browser Forum – an industry body of web browser makers, software developers, and security certificate issuers – is considering slashing the lifetime of HTTPS certs from 27 months to 13 months.
    slashing the lifetime may drive organizations into using Let's Encrypt for free, rather than encourage them to cough up payment more regularly to outfits like Digicert. Digicert and its ilk charge, typically, hundreds of dollars for their certs: forcing customers to fork out more often may be more of a turn off than a money spinner."

  2. Cory Doctorow writes in Creating a "coercion resistant" communications system:

    "Eleanor Saitta's ... 2016 essay "Coercion-Resistant Design" (which is new to me) is an excellent introduction to the technical countermeasures that systems designers can employ to defeat non-technical, legal attacks: for example, the threat of prison if you don't back-door your product.

    Saitta's paper advises systems designers to contemplate ways to arbitrage both the rule of law and technical pre-commitments to make it harder for governments to force you to weaken the security of your product or compromise your users.

    A good example of this is Certificate Transparency, a distributed system designed to catch Certificate Authorities that cheat and issue certificates to allow criminals or governments to impersonate popular websites like Google."

  3. In Web trust dies in darkness: Hidden Certificate Authorities undermine public crypto infrastructure, Thmas Claiburn reports on an important paper, Rusted Anchors: A National Client-Side View of Hidden Root CAs in the Web PKI Ecosystem:

    "With the help of the 360 Secure Browser, a widely used browser in China, the researchers analyzed the certificate chains in web visits by volunteers over the course of five months, from February through June 2020.

    "In total, over 1.17 million hidden root certificates are captured and they cause a profound impact from the angle of web clients and traffic," the researchers report. "Further, we identify around five thousand organizations that hold hidden root certificates, including fake root CAs that impersonate large trusted ones."

    Hidden root certificates refer to root CAs that are not trusted by public root programs."

    I've written before about the problems misbehaving or spoofed CAs case; I think the first time was 2014's Economic Failures of HTTPS. This paper probably requires a whole post to itself.