Friday, January 5, 2018

Meltdown & Spectre

This hasn't been a good few months for Intel. I wrote in November about the vulnerabilities in their Management Engine. Now they, and other CPU manufacturers are facing Meltdown and Spectre, three major vulnerabilities caused by side-effects of speculative execution. The release of these vulnerabilities was rushed and the initial reaction less than adequate.

The three vulnerabilties are very serious but mitigations are in place and appear to be less costly than reports focused on the worst-case would lead you to believe. Below the fold, I look at the reaction, explain what speculative execution means, and point to the best explanation I've found of where the vulnerabilities come from and what the mitigations do.

Although CPUs from AMD and ARM are also affected, Intel's initial response was pathetic, as Peter Bright reports at Ars Technica:
The company's initial statement, produced on Wednesday, was a masterpiece of obfuscation. It contains many statements that are technically true—for example, "these exploits do not have the potential to corrupt, modify, or delete data"—but utterly beside the point. Nobody claimed otherwise! The statement doesn't distinguish between Meltdown—a flaw that Intel's biggest competitor, AMD, appears to have dodged—and Spectre and, hence, fails to demonstrate the unequal impact on the different company's products.
In addition, Intel's CEO is suspected of insider trading on information about these vulnerabilities:
Brian Krzanich, chief executive officer of Intel, sold millions of dollars' worth of Intel stock—all he could part with under corporate bylaws—after Intel learned of Meltdown and Spectre, two related families of security flaws in Intel processors.
Not a good look for Intel. Nor for AMD:
AMD's response has a lot less detail. AMD's chips aren't believed susceptible to the Meltdown flaw at all. The company also says (vaguely) that it should be less susceptible to the branch prediction attack.

The array bounds problem has, however, been demonstrated on AMD systems, and for that, AMD is suggesting a very different solution from that of Intel: specifically, operating system patches. It's not clear what these might be—while Intel released awful PR, it also produced a good whitepaper, whereas AMD so far has only offered PR—and the fact that it contradicts both Intel (and, as we'll see later, ARM's) response is very peculiar.
The public release of details about Meltdown and Spectre was rushed, as developers not read-in to the problem started figuring out what was going on. This may have been due to an AMD engineer's comment:
Just after Christmas, an AMD developer contributed a Linux patch that excluded AMD chips from the Meltdown mitigation. In the note with that patch, the developer wrote, "The AMD microarchitecture does not allow memory references, including speculative references, that access higher privileged data when running in a lesser privileged mode when that access would result in a page fault."
What is speculative execution? Some things a CPU does, such as fetching a cache miss from main memory, take hundreds of clock cycles. It is a waste to stop the CPU while it waits for these operations to complete. So the CPU continues to execute "speculatively". For example, it can guess which way it is likely to go at a branch, and head off down that path ("branch prediction"). If it is right, it has saved a lot of time. If it is wrong the processor state accumulated during the speculative execution has to be hidden from the real program.

Modern processors have lots of hardware supporting speculative execution. Meltdown and Spectre are both due to cases where the side-effects of speculative execution on this hardware are not completely hidden. They can be revealed, for example, by careful timing of operations of the real CPU which the speculative state can cause to take longer or shorter than normal.

The clearest explanation of the three vulnerabilities I've seen is from

  • Variant 1 (CVE-2017-5753), “bounds check bypass.” This vulnerability affects specific sequences within compiled applications, which must be addressed on a per-binary basis.
  • Variant 2 (CVE-2017-5715), “branch target injection”. This variant may either be fixed by a CPU microcode update from the CPU vendor, or by applying a software mitigation technique called “Retpoline” to binaries where concern about information leakage is present. This mitigation may be applied to the operating system kernel, system programs and libraries, and individual software programs, as needed.
  • Variant 3 (CVE-2017-5754), “rogue data cache load.” This may require patching the system’s operating system. For Linux there is a patchset called KPTI (Kernel Page Table Isolation) that helps mitigate Variant 3. Other operating systems may implement similar protections - check with your vendor for specifics.


David. said...

Sam Varghese at IT Wire reviews the sad history of Intel's PR reaction to hardware bugs, focusing on 1997's FOOF bug:

"Intel's "judo-move response" was to create an information page claiming it dealt with the bug by linking to each of the various x86 OS vendors' bug-fix pages.

The company was effectively saying, "Here, we fixed the grave defect in our CPU by sitting on our asses and letting software coders work around our error," he wrote. "The press, of course, co-operated by simply pointing people to Intel's page and implying that Intel 'developed a fix'. That's what they're going to do this time, too, I'm sure of that."

David. said...

As if to emphasize how fundamental speculative execution has become, NVIDIA just announced:

"NVIDIA is providing an initial security update to mitigate aspects of Google Project Zero’s January 3, 2018 publication of novel information disclosure attacks that combine CPU speculative execution with known side channels.

The vulnerability has three known variants:

Variant 1 (CVE-2017-5753): Mitigations are provided with the security update included in this bulletin. NVIDIA expects to work together with its ecosystem partners on future updates to further strengthen mitigations.
Variant 2 (CVE-2017-5715): NVIDIA’s initial analysis indicates that the NVIDIA GPU Display Driver is potentially affected by this variant. NVIDIA expects to work together with its ecosystem partners on future updates for this variant.
Variant 3 (CVE-2017-5754): At this time, NVIDIA has no reason to believe that the NVIDIA GPU Display Driver is vulnerable to this variant."

That is, the GPUs are vulnerable to "bounds check bypass", potentially vulnerable to "branch target injection", and not to "rogue data cache load".

David. said...

Intel's Annus Horribilis continues with yet another vulnerability in their Management Engine:

"power up the target machine, and press CTRL+P during boot. The attacker then may log into Intel Management Engine BIOS Extension (MEBx) using the default password "admin", as this is most likely unchanged on most corporate laptops."