Certificate Transparency, about how we know we are getting content from the Web site we intended to. This part is about how we know we're running the software we intended to. This question, how to defend against software supply chain attacks, has been in the news recently:
A hacker or hackers sneaked a backdoor into a widely used open source code library with the aim of surreptitiously stealing funds stored in bitcoin wallets, software developers said Monday.See also here and here. The good news is that this was a highly specific attack against a particular kind of cryptocurrency wallet software; things could have been much worse. The bad news is that, however effective they may be against some supply chain attacks, none of the techniques I discuss below the fold would defend against this particular attack.
The malicious code was inserted in two stages into event-stream, a code library with 2 million downloads that's used by Fortune 500 companies and small startups alike. In stage one, version 3.3.6, published on September 8, included a benign module known as flatmap-stream. Stage two was implemented on October 5 when flatmap-stream was updated to include malicious code that attempted to steal bitcoin wallets and transfer their balances to a server located in Kuala Lumpur.
In an important paper entitled Software Distribution Transparency and Auditability, Benjamin Hof and Georg Carle from TU Munich use Debian's Advanced Package Tool (APT) as an example of a state-of-the-art software supply chain, and:
- Describe how APT works to maintain up-to-date software on clients by distributing signed packages.
- Review previous efforts to improve the security of this process.
- Propose to enhance APT's security by layering a system similar to Certificate Transparency (CT) on top.
- Detail the operation of their systems' logs, auditors and monitors, which are similar to CT's in principle but different in detail.
- Describe and measure the performance of an implementation of their layer on top of APT using the Trillian software underlying some CT implementations.
- Reproducible Builds.
- Bootstrappable Compilers.
How APT WorksA system running Debian or other APT-based Linux distribution runs software it received in "packages" that contain the software files, and metadata that includes dependencies. Their hashes can be verified against those in a release file, signed by the distribution publisher. Packages come in two forms, source and compiled. The source of a package is signed by the official package maintainer and submitted to the distribution publisher. The publisher verifies the signature and builds the source to form the compiled package, whose hash is then included in the release file.
The signature on the source package verifies that the package maintainer approves this combination of files for the distributor to build. The signature on the release file verifies that the distributor built the corresponding set of packages from approved sources and that the combination is approved for users to install.
Previous WorkIt is, of course, possible for the private keys on which the maintainer's and distributor's signatures depend to be compromised:
Samuel et al. consider compromise of signing keys in the design of The Update Framework (TUF), a secure application updater. To guard against key compromise, TUF introduces a number of different roles in the update release process, each of which operates cryptographic signing keys.The goal of introducing multiple roles each with its own key is to limit the damage a single compromised key can do. An orthogonal approach is to implement multiple keys for each role, with users requiring a quorum of verified signatures before accepting a package:
The following three properties are protected by TUF. The content of updates is secured, meaning its integrity is preserved. Securing the availability of updates protects against freeze attacks, where an outdated version with known vulnerabilities is served in place of a security update. The goal of maintaining the correct combination of updates implies the security of meta data.
Nikitin et al. develop CHAINIAC, a system for software update transparency. Software developers create a Merkle tree over a software package and the corresponding binaries. This tree is then signed by the developer, constituting release approval. The signed trees are submitted to co-signing witness servers.
The witnesses require a threshold of valid developer signatures to accept a package for release. Additionally, the mapping between source and binary is verified by some of the witnesses. If these two checks succeed, the release is accepted and collectively signed by the witnesses.
The system allows to rotate developer keys and witness keys, while the root of trust is an offline key. It also functions as a timestamping service, allowing for verification of update timeliness.
CT-like LayerHof and Carle's proposal is to use verifiable logs, similar to those in CT, to ensure that malfeasance is detectable. They write:
Compromise of components and collusion of participants must not result in a violation of the following security goals remaining undetected. A goal of our system is to make it infeasible for the attacker to deliver targeted backdoors. For every binary, the system can produce the corresponding source code and the authorizing maintainer. Defined irregularities, such as a failure to correctly increment version numbers, also can be detected by the system.As I understand it, this is accurate but somewhat misleading. Their system adds a transparency layer on top of APT:
The APT release file identifies, by cryptographic hash, the packages, sources, and meta data which includes dependencies. This release file, meta data, and source packages are submitted to a log server operating an appendonly Merkle tree, as shown in Figure 2. The log adds a new leaf for each file.Just as with CT, the log replies to each valid submission with a signed commitment, guaranteeing that it will shortly produce the signed root of a Merkle tree that includes the submission:
We assume maintainers may only upload signed source packages to the archive, not binary packages. The archive submits source packages to one or more log servers. We further assume that the buildinfo files capturing the build environment are signed and are made public, e.g. by them being covered by the release file, together with other meta data.
In order to make the maintainers uploading a package accountable, a source package containing all maintainer keys is created and submitted into the archive. This constitutes the declaration by the archive, that these keys were authorized to upload for this release. The key ring is required to be append-only, where keys are marked with an expiry date instead of being removed. This allows verification of source packages submitted long ago, using the keys valid at the respective point in time.
At release time, meta data and release file are submitted into the log as well. The log server produces a commitment for each submission, which constitutes its promise to include the submitted item into a future version of the tree. The log only accepts authenticated submissions from the archive. The commitment includes a timestamp, hash of the release file, log identifier and the log's signature over these. The archive should then verify that the log has produced a signed tree root that resolves the commitment. To complete the release, the archive publishes the commitments together with the updates. Clients can then proceed with the verification of the release file.The client now obtains from the distribution mirror not just the release file, but also one or more inclusion commitments showing that the release file has been submitted to one or more of the logs trusted both by the distributor and the client:
The log regularly produces signed Merkle tree roots after receiving a valid inclusion request. The signed tree root produced by the log includes the Merkle tree hash, tree size, timestamp, log identifier, and the log's signature.
Given the release file and inclusion commitment, the client can verify by hashing that the commitment belongs to this release file and also verify the signature. The client can now query the log, demanding a current tree root and an inclusion proof for this release file. Per standard Merkle tree proofs, the inclusion proof consists of a list of hashes to recompute the received root hash. For the received tree root, a consistency proof is demanded to a previous known tree root. The consistency proof is again a list of hashes. For the two given tree roots, it shows that the log only added items between them. Clients store the signed tree root for the largest tree they have seen, to be used in any later consistency proofs. Set aside split view attacks, which will be discussed later, clients verifying the log inclusion of the release file will detect targeted modifications of the release.Like CT, in addition to logs their system includes auditors, typically integrated with clients, and independent monitors regularly checking the logs for anomalies. For details, you need to read the paper, but some idea can be gained from their description of how the system detects two kinds of attack:
- The Hidden Version Attack
- The Split View Attack
The Hidden Version AttackHof and Carle describe this attack thus:
The hidden version attack attempts to hide a targeted backdoor by following correct signing and log submission procedures. It may require collusion by the archive and an authorized maintainer. The attacker prepares targeted malicious update to a package, say version v1.2.1, and a clean update v1.3.0. The archive presents the malicious package only to the victim when it wishes to update. The clean version v.1.3.0 will be presented to everybody immediately afterwards.It is true that the backdoored package would be in the logs, but that in and of itself does not indicate that it is malign:
A non-targeted user is unlikely to ever observe the backdoored version, thereby drawing a minimal amount of attention to it. The attack however leaves an audit trail in the log, so the update itself can be detected by auditing.
A package maintainer monitoring uploads for their packages using the log would notice an additional version being published. A malicious package maintainer would however not alert the public when this happens. This could be construed as a targeted backdoor in violation of the stated security goals.
To mitigate this problem a minimum time between package updates can be introduced. This can be achieved by a fixing the issuance of release files and their log submission to a static frequency, or by alerting on quick subsequent updates to one package.There may be good reasons for releasing a new update shortly after its predecessor; for example a vulnerability might be discovered in the predecessor shortly after release.
In the hidden version attack, the attacker increases a version number in order to get the victim to update a package. The victim will install this backdoored update. The monitor detects the hidden version attack due to the irregular release file publication. There are now two cases to be considered. The backdoor may be in the binary package, or it may be in the source package.Note that although their system's monitors detect this attack, and can correctly attribute it, they do so asynchronously. They do not prevent the victim installing the backdoored update.
The first case will be detected by monitors verifying the reproducible builds property. A monitor can rebuild all changed source packages on every update and check if the resulting binary matches. If not, the blame falls clearly on the archive, because the source does not correspond to the binary, which can be demonstrated by exploiting reproducible builds.
The second case requires investigation of the packages modified by the update. The source code modifications can be investigated for the changed packages, because all source code is logged. The fact that source code can be analyzed, and no analysis on binaries is required, makes the investigation of the hidden version alert simpler. The blame for this case falls on the maintainer, who can be identified by their signature on the source package. If the upload was signed by a key not in the allowed set, the blame falls on the archive for failing to authorize correctly.
If the package version numbers in the meta data are inconsistent, this constitutes a misbehavior by the submitting archive. It can easily be detected by a monitor. Using the release file the monitor can also easily ensure, by demanding inclusion proofs, that all required files have been logged.
The Split View AttackThe logs cannot be assumed to be above suspicion. Hof and Carle describe a log-based attack:
The most significant attack by the log or with the collusion of the log is equivocation. In a split-view or equivocation attack, a malicious log presents different versions of the Merkle tree to the victim and to everybody else. Each tree version is kept consistent in itself. The tree presented to the victim will include a leaf that is malicious in some way, such as an update with a backdoor. It might also omit a leaf in order to hide an update. This is a powerful attack within the threat model that violates the security goals and must therefore be defended. A defense against this attack requires the client to learn if they are served from the same tree as the others.Their defense requires that their be multiple logs under independent administration, perhaps run by different Linux distributions. Each time a "committing" log generated a new tree root containing new package submissions, it would be required to submit a signed copy of the root to one or more "witness" logs under independent administration. The "committing" log will obtain commitments from the "witness" logs, and supply them to clients. Clients can then verify that the root they obtain from the "committing" log matches that obtained directly from the "witness" logs:
When the client now verifies a log entry with the committing log, it also has to verify that a tree root covering this entry was submitted into the witnessing log. Additionally, the client verifies the append-only property of the witnessing log.
The witnessing log introduces additional monitoring requirements. Next to the usual monitoring of the append-only operation, we need to check that no equivocating tree roots are included. To this end, a monitor follows all new log entries of the witnessing log that are tree roots of the committing log. The monitor verifies that they are all valid extensions of the committing log's tree history.
Reproducible BuildsOne weakness in Hof and Carle's actual implementation is in the connection between the signed package of source and the hashes of the result of compiling it. It is in general impossible to verify that the binaries are the result of compiling the source. In many cases, even if the source is re-compiled in the same environment the resulting binaries will not be bit-for-bit identical, and thus their hashes will differ. The differences have many causes, including timestamps, randomized file names, and so on. Of course, changes in the build environment can also introduce differences.
To enable binaries to be securely connected to their source, a Reproducible Builds effort has been under way for more than 5 years. Debian project lead Chris Lamb's 45-minute talk Think you're not a target? A tale of 3 developers ... provides an overview of the problem and the work to solve it using three example compromises:
- Alice, a package developer who is blackmailed to distribute binaries that don't match the public source.
- Bob, a build farm sysadmin whose personal computer has been compromised, leading to a compromised build toolchain in the build farm that inserts backdoors into the binaries.
- Carol, a free software enthusiast who distributes binaries to friends. An evil maid attack has compromised her laptop.
Bootstrappable CompilersOne of the most famous of the ACM's annual Turing Award lectures was Ken Thompson's 1984 Reflections On Trusting Trust (also here). In 2006, Bruce Schneier summarized its message thus:
Way back in 1974, Paul Karger and Roger Schell discovered a devastating attack against computer systems. Ken Thompson described it in his classic 1984 speech, "Reflections on Trusting Trust." Basically, an attacker changes a compiler binary to produce malicious versions of some programs, INCLUDING ITSELF. Once this is done, the attack perpetuates, essentially undetectably. Thompson demonstrated the attack in a devastating way: he subverted a compiler of an experimental victim, allowing Thompson to log in as root without using a password. The victim never noticed the attack, even when they disassembled the binaries -- the compiler rigged the disassembler, too.Schneier was discussing David A. Wheeler's Countering Trusting Trust through Diverse Double-Compiling. Wheeler's subsequent work led to his 2009 Ph.D. thesis. To oversimpify, his technique involves the suspect compiler compiling its source twice, and comparing the output to that from a "trusted" compiler compiling the same source twice. He writes:
DDC uses a second “trusted” compiler cT, which is trusted in the sense that we have a justified confidence that cT does not have triggers or payloadsThere are two issues here. The first is an assumption that the suspect compiler's build is reproducible. The second is the issue of where the "justified confidence" comes from. This is the motivation for the Bootstrappable Builds project, whose goal is to create a process for building a complete toolchain starting from a "seed" binary that is simple enough to be certified "by inspection". One sub-project is Stage0:
The current 0.2.0 release of Stage0:Stage0 starts with just a 280byte Hex monitor and builds up the infrastructure required to start some serious software development. With zero external dependencies, with the most painful work already done and real langauges such as assembly, forth and garbage collected lisp already implemented
marks the first C compiler hand written in Assembly with structs, unions, inline assembly and the ability to self-host it's C version, which is also self-hostingThere is clearly a long way still to go to a bootstrapped full toolchain.
A More Secure Software Supply ChainA software supply chain based on APT enhanced with Hof and Carle's transparency layer, distributing packages reproducibly built with bootstrapped compilers, would be much more difficult to attack than current technology. Users of the software could have much higher confidence that the binaries they installed had been built from the corresponding source, and that no attacker had introduced functionality not evident in the source.
These checks would take place during software installation or update. Users would still need to verify that the software had not been modified after installation, perhaps using a tripwire-like mechanism, But this mechanism would have a trustworthy source of the hashes it needs to do its job.
Remaining Software ProblemsDespite all these enhancements, the event-stream attack would still have succeeded. The attackers targeted a widely-used, fairly old package that was still being maintained by the original author, a volunteer. They offered to take over what had become a burdensome task, and the offer was accepted. Now, despite the fact that the attacker was just an e-mail address, they were the official maintainer of the package and could authorize changes. Their changes, being authorized by the official package maintainer, would pass unimpeded through even the enhanced supply chain.
First, it is important to observe the goal of Hof and Carle's system is to detect targeted attacks, those delivered to a (typically small) subset of user systems. The event-stream attack was not targeted; it was delivered to all systems updating the package irrespective of whether they contained the wallet to be compromised. That their system is designed only to detect targeted attacks seems to me to be a significant weakness. It is very easy to design an attack, like the event-stream one, that is broadcast to all systems but is harmless on all but the targets.
Second, Hof and Carle's system operates asynchronously, so is intended to detect rather than prevent victim compromise. Of course, once the attack was detected it could be unambiguously attributed. But:
- The attack would already have succeeded in purloining cryptocurrency from the target wallets. This seems to me to be a second weakness; in many cases the malign package would only need to be resident on the victim for a short time to exfiltrate critical data, or install further malware providing persistence.
- Strictly speaking, the attribution would be to a private key. More realistically, it would be to a key and an e-mail address. In the case of an attack, linking these to a human malefactor would likely be difficult, leaving the perpetrators free to mount further attacks. Even if the maintainer had not, as in the event-stream attack, been replaced via social engineering, it is possible that their e-mail and private key could have been compromised.
Would a similar defense against "Sybil" attacks on the software supply chain be possible? There are a number of issues:
- The potential gains from such attacks are large, both because they can compromise very large numbers of systems quickly (event-stream had 2M downloads), and because the banking credentials, cryptocurrency wallets, and other data these systems contain can quickly be converted into large amounts of cash.
- Thus the penalty for mounting an attack would have to be an even larger amount of cash. Package maintainers would need to be bonded or insured for large sums, which implies that distributions and package libraries would need organizational structures capable of enforcing these requirements.
- Bonding and insurance would be expensive for package maintainers, who are mostly unpaid volunteers. There would have to be a way of paying them for their efforts, at least enough to cover the costs of bonding and insurance.
- Thus users of the packages would need to pay for their use, which means the packages could neither be free, nor open source.
Hof and Carle's system shares one more difficulty with CT. Both systems are layered on top of an existing infrastructure, respectively APT and TLS with certificate authorities. In both cases there is a bootstrap problem, an assumption that as the system starts up there is not an attack already underway. In CT's case the communications between the CA's, Web sites, logs, auditors and monitors all use the very TLS infrastructure that is being secured (see here and here). This is also the case for Hof and Carle, plus they have to assume the lack of malware in the initial state of the packages.
Hardware Supply Chain ProblemsAll this effort to secure the software supply chain will be for naught if the hardware it runs on is compromised:
- Much of what we think of as "hardware" contains software to which what we think of as "software" has no access or visibility. Examples include Intel's Management Engine, the baseband processor in mobile devices, complex I/O devices such as NICs and GPUs. Even if this "firmware" is visible to the system CPU, it is likely supplied as a "binary blob" whose source code is inaccessible.
- Attacks on the hardware supply chain have been in the news recently, with the firestorm of publicity sparked by Bloomberg's, probably erroneous reports, of a Chinese attack on SuperMicro motherboards that added "rice-grain" sized malign chips.