Friday, January 5, 2018

Meltdown & Spectre

This hasn't been a good few months for Intel. I wrote in November about the vulnerabilities in their Management Engine. Now they, and other CPU manufacturers are facing Meltdown and Spectre, three major vulnerabilities caused by side-effects of speculative execution. The release of these vulnerabilities was rushed and the initial reaction less than adequate.

The three vulnerabilties are very serious but mitigations are in place and appear to be less costly than reports focused on the worst-case would lead you to believe. Below the fold, I look at the reaction, explain what speculative execution means, and point to the best explanation I've found of where the vulnerabilities come from and what the mitigations do.

Although CPUs from AMD and ARM are also affected, Intel's initial response was pathetic, as Peter Bright reports at Ars Technica:
The company's initial statement, produced on Wednesday, was a masterpiece of obfuscation. It contains many statements that are technically true—for example, "these exploits do not have the potential to corrupt, modify, or delete data"—but utterly beside the point. Nobody claimed otherwise! The statement doesn't distinguish between Meltdown—a flaw that Intel's biggest competitor, AMD, appears to have dodged—and Spectre and, hence, fails to demonstrate the unequal impact on the different company's products.
In addition, Intel's CEO is suspected of insider trading on information about these vulnerabilities:
Brian Krzanich, chief executive officer of Intel, sold millions of dollars' worth of Intel stock—all he could part with under corporate bylaws—after Intel learned of Meltdown and Spectre, two related families of security flaws in Intel processors.
Not a good look for Intel. Nor for AMD:
AMD's response has a lot less detail. AMD's chips aren't believed susceptible to the Meltdown flaw at all. The company also says (vaguely) that it should be less susceptible to the branch prediction attack.

The array bounds problem has, however, been demonstrated on AMD systems, and for that, AMD is suggesting a very different solution from that of Intel: specifically, operating system patches. It's not clear what these might be—while Intel released awful PR, it also produced a good whitepaper, whereas AMD so far has only offered PR—and the fact that it contradicts both Intel (and, as we'll see later, ARM's) response is very peculiar.
The public release of details about Meltdown and Spectre was rushed, as developers not read-in to the problem started figuring out what was going on. This may have been due to an AMD engineer's comment:
Just after Christmas, an AMD developer contributed a Linux patch that excluded AMD chips from the Meltdown mitigation. In the note with that patch, the developer wrote, "The AMD microarchitecture does not allow memory references, including speculative references, that access higher privileged data when running in a lesser privileged mode when that access would result in a page fault."
What is speculative execution? Some things a CPU does, such as fetching a cache miss from main memory, take hundreds of clock cycles. It is a waste to stop the CPU while it waits for these operations to complete. So the CPU continues to execute "speculatively". For example, it can guess which way it is likely to go at a branch, and head off down that path ("branch prediction"). If it is right, it has saved a lot of time. If it is wrong the processor state accumulated during the speculative execution has to be hidden from the real program.

Modern processors have lots of hardware supporting speculative execution. Meltdown and Spectre are both due to cases where the side-effects of speculative execution on this hardware are not completely hidden. They can be revealed, for example, by careful timing of operations of the real CPU which the speculative state can cause to take longer or shorter than normal.

The clearest explanation of the three vulnerabilities I've seen is from

  • Variant 1 (CVE-2017-5753), “bounds check bypass.” This vulnerability affects specific sequences within compiled applications, which must be addressed on a per-binary basis.
  • Variant 2 (CVE-2017-5715), “branch target injection”. This variant may either be fixed by a CPU microcode update from the CPU vendor, or by applying a software mitigation technique called “Retpoline” to binaries where concern about information leakage is present. This mitigation may be applied to the operating system kernel, system programs and libraries, and individual software programs, as needed.
  • Variant 3 (CVE-2017-5754), “rogue data cache load.” This may require patching the system’s operating system. For Linux there is a patchset called KPTI (Kernel Page Table Isolation) that helps mitigate Variant 3. Other operating systems may implement similar protections - check with your vendor for specifics.


David. said...

Sam Varghese at IT Wire reviews the sad history of Intel's PR reaction to hardware bugs, focusing on 1997's FOOF bug:

"Intel's "judo-move response" was to create an information page claiming it dealt with the bug by linking to each of the various x86 OS vendors' bug-fix pages.

The company was effectively saying, "Here, we fixed the grave defect in our CPU by sitting on our asses and letting software coders work around our error," he wrote. "The press, of course, co-operated by simply pointing people to Intel's page and implying that Intel 'developed a fix'. That's what they're going to do this time, too, I'm sure of that."

David. said...

As if to emphasize how fundamental speculative execution has become, NVIDIA just announced:

"NVIDIA is providing an initial security update to mitigate aspects of Google Project Zero’s January 3, 2018 publication of novel information disclosure attacks that combine CPU speculative execution with known side channels.

The vulnerability has three known variants:

Variant 1 (CVE-2017-5753): Mitigations are provided with the security update included in this bulletin. NVIDIA expects to work together with its ecosystem partners on future updates to further strengthen mitigations.
Variant 2 (CVE-2017-5715): NVIDIA’s initial analysis indicates that the NVIDIA GPU Display Driver is potentially affected by this variant. NVIDIA expects to work together with its ecosystem partners on future updates for this variant.
Variant 3 (CVE-2017-5754): At this time, NVIDIA has no reason to believe that the NVIDIA GPU Display Driver is vulnerable to this variant."

That is, the GPUs are vulnerable to "bounds check bypass", potentially vulnerable to "branch target injection", and not to "rogue data cache load".

David. said...

Intel's Annus Horribilis continues with yet another vulnerability in their Management Engine:

"power up the target machine, and press CTRL+P during boot. The attacker then may log into Intel Management Engine BIOS Extension (MEBx) using the default password "admin", as this is most likely unchanged on most corporate laptops."

David. said...

More than raised eyebrows needed here:

"Intel quietly warned computer manufacturers at the end of November that its chips were insecure due to design flaws, according to an internal Chipzilla document.

French tech publication LeMagIT reported this week it had obtained a top-secret Intel memo sent to OEM customers on November 29 under a confidentiality and non-disclosure agreement, meaning the hardware makers were banned from discussing the file's contents.


The date of the disclosure to OEMs is likely to raise eyebrows as it happened on the same day Intel chief exec Brian Krzanich sold shares in his company worth $25m before tax."

David. said...

Lily Hay Newman's Meltdown and Spectre Patching Has Been a Total Train Wreck is a good overview of the ongoing dumpster fire.

David. said...

Its not just Meltdown & Spectre. Thomas Claburn at The Register has a list of four other major CPU bugs in just the past year:

"In 2015, Microsoft senior engineer Dan Luu forecast a bountiful harvest of chip bugs in the years ahead.

"We’ve seen at least two serious bugs in Intel CPUs in the last quarter, and it’s almost certain there are more bugs lurking," he wrote. "There was a time when a CPU family might only have one bug per year, with serious bugs happening once every few years, or even once a decade, but we’ve moved past that."

Thanks to growing chip complexity, compounded by hardware virtualization, and reduced design validation efforts, Luu argued, the incidence of hardware problems could be expected to increase.

David. said...

Jean-Louis Gassée's readable Beyond Spectre & Meltdown CPU Bugs points to two useful resources. Eben Upton's explainer of how this class of bugs works (while pointing out that because the ARM cores used in the Raspberry Pi don't speculate, they aren't vulnerable), and a 1995 paper The Intel 80x86 Processor Architecture: Pitfalls for Secure Systems by Olin Sibert, Phillip A Porras and Robert Lindell.

This describes a series of covert signaling channels between two untrusted processes provided by the x86 architecture as it was in 1995. These are different from Spectre and Meltdown in that they involve two processes cooperating to communicate, rather than a malicious and a victim process. It also describes 8 reported flaws that constitute security vulnerabilities, and 9 reported flaws that allow unprivileged code to hang the CPU, in various 386 and 486 versions. The paper makes it clear that, even with CPUs vastly simpler than today's, it was hard to prevent insecurities in both architecture and implementation.

David. said...

"The Spectre vulnerability is here to stay. Even if you choose to ignore it, the problem still exists. This is potentially a very bad thing for public cloud vendors. It may end up being great for chip manufacturers. It's fantastic for VMware." is the start of an interesting article by Trevor Pott at The Register. He discusses the implications of a known but un-fixable vulnerability allowing a malicious process to spy on other with which it shares a host for cloud providers:

"This isn't exactly good news if you're a public cloud provider that is trying to build enough trust to absorb a significant percentage of the world's regulated workloads. It's one thing for software vulnerabilities to exist, it's another to have known hardware vulnerabilities. That's not good when you're selling the concept of shared infrastructure."

David. said...

Paul McLellan at Cadence connects the dots:

"I wrote yesterday about the two exploits, Spectre and Meltdown. I think that the most amazing thing about the security weakness exposed is that it has been around for 20 years, in dozens of microprocessors, before coming to light this year. The only equivalent thing that I can remember was when Ken Thompson revealed, in his acceptance for the Turing Award, that "I cannot be trusted."

Paul points out that the same technique Ken may or may not have used to bury an undetectable backdoor in Unix may or may not have been used to bury undetectable backdoors in chips, by compromising the compiler used to build the EDA tools used to design them.

Tip of the hat to Cory Doctorow, who also makes some good points.

David. said...

"Researchers have discovered more than 130 malware samples designed to exploit the recently disclosed Spectre and Meltdown CPU vulnerabilities. While a majority of the samples appear to be in the testing phase, we could soon start seeing attacks." writes Eduard Kovacs at Security Week.

David. said...

Red Hat's Jon Masters gave an EE380 talk entitled Exploiting modern microarchitectures: Meltdown, Spectre and other hardware attacks. It is a careful, comprehensive, discussion that starts from the basics of Instruction Set Architectures, through their implementation via a microarchitecture, and the way features of the microarchitecture provide side-channel attacks (including Meltdown and Spectre). The abstract is here, the slides are here.

David. said...

Brendan Gregg has a very detailed look at the performance impact of KPTI, the Linux fix for Meltdown:

"Practically, I'm expecting the cloud systems at my employer (Netflix) to experience between 0.1% and 6% overhead with KPTI due to our syscall rates, and I'm expecting we'll take that down to less than 2% with tuning: using 4.14 with pcid support, huge pages (which can also provide some gains), syscall reductions, and anything else we find."

Hat tip to Simon Sharwood at The Register.

David. said...

"Microsoft's new compiler feature will insert an instruction to block speculation in code that the compiler detects as being vulnerable to Spectre." writes Peter Bright at Ars Technica.

But to avoid crippling performance degradation, it uses heuristics:

"unfortunately, Microsoft's heuristics are tightly constrained. They detect some Spectre-vulnerable code patterns, but not all of them. Even small changes to a vulnerable piece of code can defeat Microsoft's heuristics—the code will be vulnerable to Spectre, but the compiler won't add lfence instructions to protect it."

David. said...

"In a research paper – "MeltdownPrime and SpectrePrime: Automatically-Synthesized Attacks Exploiting Invalidation-Based Coherence Protocols" – out this month, bit boffins from Princeton University and chip designer Nvidia describe variants of Meltdown and Spectre exploit code that can be used to conduct side-channel timing attacks." writes Thomas Claburn at The Register.