Tuesday, February 13, 2018

Correlated Cryptojacking

On February 11 at least 4,275 Web sites were found to have been simultaneously cryptojacked:
they include The City University of New York (cuny.edu), Uncle Sam's court information portal (uscourts.gov), Lund University (lu.se), the UK's Student Loans Company (slc.co.uk), privacy watchdog The Information Commissioner's Office (ico.org.uk) and the Financial Ombudsman Service (financial-ombudsman.org.uk), plus a shedload of other .gov.uk and .gov.au sites, UK NHS services, and other organizations across the globe.

Manchester.gov.uk, NHSinform.scot, agriculture.gov.ie, Croydon.gov.uk, ouh.nhs.uk, legislation.qld.gov.au, the list goes on.
They were all running Coinhive's Monero miner in visitors' browsers. How and why did this happen and what should these sites have been doing to prevent it? Follow me below the fold.

Some resources for a page
Today's Web pages are constructed from resources from all over the Web, not just from the site at which you pointed your browser. Here, for example, is part of NoScript's listing of where all the resources used by a page at Talking Points Memo came from. All 36 of them. You'll notice that I use NoScript to block almost all of them from actually executing. As blissex wrote in this comment, we are living:
In an age in which every browser gifts a free-to-use, unlimited-usage, fast VM to every visited web site, and these VMs can boot and run quite responsive 3D games or Linux distributions
You (and I) need to be very careful about gifting these VMs to sites we don't trust, which is why I use NoScript. The resources Talking Points Memo wants me to execute are not delivering me information I want. A few of them want to show me ads, probably for things I just purchased and so will definitely not purchase again. Most of them want to track me. Back in 2015 Georgis Kontaxis and Monica Chew won "Best Paper" at the Web 2.0 Security and Privacy workshop for Tracking Protection in Firefox for Privacy and Performance (PDF). They demonstrated that Tracking Protection provided:
a 67.5% reduction in the number of HTTP cookies set during a crawl of the Alexa top 200 news sites. [and] a 44% median reduction in page load time and 39% reduction in data usage in the Alexa top 200 news site.
Some of them would likely have been malvertising, using the incredibly complex and opaque advertising ecosystem as an efficient channel for distributing malware. But increasingly, as in this case, some of them would be cryptojacking, mining cryptocurrency in your browser. It turns out that, although the return from an individual browser is small, as Brannon Dorsey demonstrated it is easy to collect vast amounts of computing resource by advertising:
Anyone can make an account, create an ad with god-knows-what Javascript in it, then pay to have the network serve that ad up to thousands of browser.

So that's what Dorsey did -- very successfully. Within about three hours, his code (experimental, not malicious, apart from surreptitiously chewing up processing resources) was running on 117,852 web browsers, on 30,234 unique IP addresses. Adtech, it turns out, is a superb vector for injecting malware around the planet.

Some other fun details: Dorsey found that when people loaded his ad, they left the tab open an average of 15 minutes. That gave him huge amounts of compute time -- 327 full days, in fact, for about $15 in ad purchase.
But the ads aren't the only remote resources in the typical page. The language of Web pages used to be HTML, now it is JavaScript. JavaScript is a programming language. People who write programs use libraries, so the typical page loads JavaScript libraries. Libraries are programs. Programs have bugs. As Thomas Claiburn at The Register explained last March, this means the JavaScript libraries your browser is executing have bugs:
The web has a security problem: code libraries. Almost 88 per cent of the top 75,000 websites and 47 per cent of .com websites rely on at least one vulnerable JavaScript library.

As described in a recently published paper, "Thou Shalt Not Depend on Me: Analysing the Use of Outdated JavaScript Libraries on the Web," researchers from Northeastern University in Boston, Massachusetts, have found that many websites rely widely on insecure versions of JavaScript libraries and that there's no immediate way to eliminate this problem.
Dan Goodin at Ars Technica understands the real significance of this incident:
If someone can control the JavaScript that the US court system and thousands of other organizations load into their webpages, they can potentially exploit critical browser flaws, steal log-in credentials, and perform other malicious acts. As offensive as drive-by mining is, it's one of the more benign offenses that can result from malicious code that gets executed on our devices.
Chris Williams at The Register explains what happened to the 4,275+ Web sites:
The affected sites all use a fairly popular plugin called Browsealoud, made by Brit biz Texthelp, which reads out webpages for blind or partially sighted people.

This technology was compromised in some way – either by hackers or rogue insiders altering Browsealoud's source code – to silently inject Coinhive's Monero miner into every webpage offering Browsealoud.
Plugin, library, same difference. They're both code from some place else that the page invokes. On Twitter, Prof. Alan Woodward points out what sites should have been doing to prevent this:
This is what happens when you use third party content & don’t ensure its integrity. Just look at all those public sector sites affected. If you wanna know how to stop it read these:
https://scotthelme.co.uk/subresource-integrity/ …
https://scotthelme.co.uk/content-security-policy-an-introduction/ …
And use @reporturi
Prof. Woodward and security researcher Scott Helme are suggesting sites need to do three things:
I've written about CSP in the context of Web archives here and here. In the context of the live Web, Scott Helme's introduction to CSP explains its use:
A CSP header allows you to define approved sources for content on your site that the browser can load. By specifying only those sources that you wish the browser to load content from, you can protect your visitors from a whole range of issues. Here is a basic CSP response header.

Content-Security-Policy: script-src 'self'

Going back to the example above of an attacker using a specially crafted comment to load javascript from another domain, this CSP header would prevent the browser loading content from nastyhackers.com. The script-src directive specifies the whitelist of sources that the browser may load scripts from. Using the 'self' keyword is easier than specifying my whole domain and makes the policy a little easier to read once it starts growing. Because the domain nastyhackers.com isn't in the script whitelist, the browser will not load the script content from nastyhackers.com.
If the 4,275+ sites had used CSP headers, the compromised Browsealoud plugin would not have been able to load the JavaScript miner from coinhive.com.

But coinhive.com is not actually the problem. The problem is that your browser is running malicious code from browseraloud.com, and that is a site that would have been in the CSP whitelist. The miscreants could have copied the miner code into the browseraloud.com page rather than incorporating it by reference. This is where SRI comes in.

What SRI is intended to do is to prevent Content Distribution Networks (CDNs) altering content they are distributing. Scott Helme explains:
Most sites on the Internet these days load some kind of content from a CDN, usually JS and CSS. Whilst this comes with great performance boosts and savings on bandwidth, we're trusting that CDN to load content into our pages, content that could possibly be harmful. Until now, we had no way to verify the content we were loading from the CDN was actually what we expected, it could have been altered or replaced. SRI allows us to check the integrity of the JS or CSS to ensure it's exactly what we were expecting.
How is this done?
SRI allows us to instruct the browser to perform an integrity check on an asset loaded from a 3rd party. By embedding the base64 encoded cryptographic hash digest that we expect for the asset into the script or link tag, the browser can download the asset and check its cryptographic hash digest against the one it was expecting. If the hash of the downloaded asset matches the hash that we provided, then the content is what we were expecting to receive and the browser can safely include the script or style. If the hash doesn't match then we know we can't trust the data and it must be discarded.
There are lots of interesting details about the use of SRI that you can read about in Scott Helme's explanation. But what's relevant here is that SRI doesn't just protect your site's visitors from accidental or malicious corruption of content by CDNs, it protects them from compromise of the originators of resources that your site uses such as browseraloud.com. Had the sites co-opted into mining Monero used SRI to force the browser to check the hash of the code from browseraloud.com,they would have discovered that the code had been corrupted and refused to run it.

But what of reporturi? It is a service that allows webmasters to track violations of the CSP they set. In this case, it would have alerted the sites to the fact that attempts were being made to load JavaScript from coinhive.com.

3 comments:

David. said...

The gross from the Browsealoud hack was just $24!.

David. said...

Epidemic of cryptojacking can be traced to escaped NSA superweapon writes Cory Doctorow:

"The epidemic of cryptojacking malware isn't merely an outgrowth of the incentive created by the cryptocurrency bubble -- that's just the motive, and the all-important the means and opportunity were provided by the same leaked NSA superweapon that powered last year's Wannacry ransomware epidemic."

David. said...

Hackers Hijacked Tesla's Amazon Cloud Account To Mine Cryptocurrency reports msmash at \.