Thursday, June 25, 2020

Deanonymizing Ethereum Users

In last January's Bitcoin's Lightning Network I discussed A Cryptoeconomic Traffic Analysis of Bitcoin’s Lightning Network by the Hungarian team of Ferenc Béres, István A. Seres, and András A. Benczúr. They demolished the economics of the Lightning Network, writing:
Our findings on the estimated revenue from transaction fees are in line with the widespread opinion that participation is economically irrational for the majority of the large routing nodes who currently hold the network together. Either traffic or transaction fees must increase by orders of magnitude to make payment routing economically viable.
Below the fold I comment on their latest work.

It has been clear for some time that the privacy of Bitcoin's and similar blockchains is illusory. Companies such as Chainalysis exist to pierce their shields, despite the availability of privacy enhancements such as mixers. Now Blockchain is Watching You: Profiling and Deanonymizing Ethereum Users by the same team plus Mikerah Quintyne-Collins points out that the same applies even more strongly to Ethereum:
Ethereum is the largest public blockchain by usage. It applies an account-based model, which is inferior to Bitcoin’s unspent transaction output model from a privacy perspective. As the account-based models for blockchains force address reuse, we show how transaction graphs and other quasi-identifiers of users such as time-of-day activity, transaction fees, and transaction graph analysis can be used to reveal some account owners. To the best of our knowledge, we are the first to propose and implement Ethereum user profiling techniques based on user quasi-identifiers.
Mixers appeared in the Bitcoin ecosystem in an attempt to mitigate its inadequate privacy by obscuring the history of transactions in a herd of unrelated ones. The programmability of Ethereum allows not just for mixers, but also more complex ways to obscure history:
Due to the privacy shortcomings of the account based model, recently several privacy-enhancing overlays have been deployed on Ethereum, such as noncustodial, trustless coin mixers and confidential transactions. We assess the strengths and weaknesses of the existing privacy-enhancing solutions and quantitatively assess the privacy guarantees of the Ethereum blockchain and ENS. We identify several heuristics as well as profiling and deanonymization techniques against some popular and emerging privacy-enhancing tools.
Because "the account-based models for blockchains force address reuse", addresses in Ethereum are more persistent, so there is a need to name them:
Ethereum Name Service (ENS) is a distributed, open,and extensible naming system based on the Ethereum blockchain. ... ENS maps human-readable names like alice.eth to machine-readable identifiers such as Ethereum addresses. Therefore, ENS provides a more user-friendly way of transferring assets on Ethereum, where users can use ENS names (alice.eth) as recipient addresses instead of the error-prone hexadecimal Ethereum addresses.
Béres et al Table 1
The research team collected addresses with which to experiment:
  • Twitter: By using the Twitter API, we were able to collect 890 ENS names included in Twitter profiles, and discover the connected Ethereum addresses.
  • Humanity DAO: human registry of Ethereum users, which can include a Twitter handle in addition to the Ethereum address.
  • TornadoCash mixer contracts: We collected all Ethereum addresses that issued or received transactions from TornadoCash mixers.
Béres et al Figure 2
And the transactions in which they were involved:
By using the Etherscan blockchain explorer API, we collected 1,155,188 transactions sent or received by the addresses in our collection. The final transaction graph contains 159,339 addresses. The transactions span from 2015-07-30 till 2020-04-04.
They used three quasi-identifiers to link multiple addresses in their collection to a single user:
  • Time-of-day transaction activity (Section 6.1):
    Ethereum blockchain transaction timestamps reveal the daily activity patterns of the account owner
    The idea being that the more similar the daily pattern of address usage, the more likely the addresses belong to the same user.
  • Gas price distribution (Section 6.2):
    Ethereum transactions also contain the gas price, which is usually automatically set by wallet softwares. Users rarely change this setting manually. Most wallet user interfaces offer three levels of gas prices, slow, average, and fast where the fast gas price guarantees almost immediate inclusion in the blockchain.
    The idea being that the more similar the pattern of gas price selection, the more likely the addresses belong to the same user.
  • Transaction graph analysis (Section 6.3):
    The set of addresses used in interactions characterize a user. Users with multiple accounts might interact with the same addresses or services from most of them. Furthermore, as users move funds between their personal addresses, they may unintentionally reveal their address clusters.
    As with Bitcoin, slight slips in operational security lead to deanonymization. In practice, few users can maintain adequate OpSec.
The authors don't expect these techniques to deliver complete deanonymization:
Exact identification is an overly ambitious goal in our experiments, which aim to use very limited public information to rank candidate pairs and quantify the leaked information as risk for a potential systematic deanonymization attack. For this reason, we quantify non-exact matches, since even though our deanonymizing tools might not exactly find a mixing address, they can radically reduce the anonymity set, which is still harmful to privacy.
Béres et al Figure 15
What their techniques deliver is a list of addresses ranked from most to least likely to belong to the same user. They compare time-of-day, gas price and two forms of graph analysis in Figure 15. It shows the fraction of their 129 address pairs from ENS names with exactly two addresses that are in the top X of the ranked list. Graph analysis is clearly better than the alternatives. Combining the two graph analysis techniques gets more than 75 of the top 100 pairs to be in their test set. This isn't great, but it is way more than enough to force anyone using Ethereum for nefarious purposes to resort to privacy-enhancing technology.

Thus in Section7 the authors attack the most popular Ethereum mixer:
The Tornado Cash (TC) Mixers are sets of trustless Ethereum smart contracts allowing Ethereum users to enhance their anonymity. A TC mixer contract holds equal amounts of funds (ether or other ERC-20 tokens) from a set of depositors. One contract typically holds one type of asset. In case of the TC mixer, anonymity is achieved by applying zkSNARKs [22]. Each depositor inserts a hash value in a Merkle-tree. Later, at withdraw time, each legitimate withdrawer can prove unlinkably with a zero-knowledge proof that they know the pre-image of a previously inserted hash leaf in the Merkle-tree. Subsequently, users can withdraw their asset from the mixer whenever they consider that the size of the anonymity set is satisfactory.
As usual in cryptocurrencies, the technology depends upon impractically perfect OpSec by users. The authors base three address-linking heuristics on this observation:
  • A user uses the same address for both a deposit and the subsequent withdrawal.
  • A user manually sets the same unique gas value for both a deposit and the subsequent withdrawal.
  • A user uses addresses between which a transaction can be found for both a deposit and the subsequent withdrawal.
Their Table 2 shows that these three heuristics linked nearly 18% of withdrawals to their deposits in the most popular 0.1ETH mixer. This not just bad for the depositors involved, but for all users; it means the anonymity set is at most 82% as big as they think it is.

They observe endemic OpSec failures:
In Figure 17, we observe that most users of the linked deposit-withdraw pairs leave their deposit for less than a day in the mixer contract. This user behavior can be exploited for deanonymization by assuming that the vast majority of the deposits are always withdrawn after one or two days.
This is really bad, as they point out:
For example, for the 0.1ETH mixer the original average anonymity set size of 400 could be reduced to almost 12 by assuming that the deposit occurred within one day of the withdraw.
But it isn't the worst:
Even worse, in Figure 19 we observe several addresses receiving more than one withdraws from the 0.1 ETH mixer contract. For instance, there are 85 addresses with two withdraws and 27 addresses with three withdraws. Withdraw clusters cause privacy risk not just for the owner but for all other mixer participants as well. Note that proper usage requires withdraw always to fresh addresses.
In Blockchain: What's Not To Like? I wrote:
In practice the security of a blockchain depends not merely on the security of the protocol itself, but on the security of the core software and the wallets and exchanges used to store and trade its cryptocurrency. This ancillary software has bugs, such as the recently revealed major vulnerability in Bitcoin Core, the Parity Wallet fiasco, and the routine heists using vulnerabilities in exchange software.
But I missed an important point. Almost 21 years ago, in Why Johnny Can't Encrypt: A Usability Evaluation of PGP 5.0, Alma Whitten and J.D. Tygar showed that PGP did not in practice deliver the excellent security it promised in theory because:
User errors cause or contribute to most computer security failures, yet user interfaces for security still tend to be clumsy, confusing, or near-nonexistent. Is this simply due to a failure to apply standard user interface design techniques to security?  We argue that, on the contrary, effective security requires a different usability standard, and that it will not be achieved through the user interface design techniques appropriate to other types of consumer software.

To test this hypothesis, we performed a case study of a security program which does have a good user interface by general standards:  PGP 5.0. ... The analysis found a number of user interface design flaws that may contribute to security failures, and the user test demonstrated that when our test participants were given 90 minutes in which to sign and encrypt a message using PGP 5.0, the majority of them were unable to do so successfully.

We conclude that PGP 5.0 is not usable enough to provide effective security for most computer users, despite its attractive graphical user interface, supporting our hypothesis that user interface design for effective security remains an open problem.
Fourteen years ago Steve Sheng et al revisited the issue in Why Johnny Still Can’t Encrypt: Evaluating the Usability of Email Encryption Software:
We ran a pilot of the study with six novice users using PGP 9 and Outlook Express 6.0. Even though we only performed a pilot study, several patterns emerged early to indicate major problems in PGP 9.
In summary, compared with Whitten’s study of PGP 5, PGP 9 made strides in automatically encrypting emails. The key certification process becomes the key to the issue in PGP 9 has not made any improvements. PGP 9’s presents multiple instances where the interface does not provide enough cues or feedback for the user.
Three years ago, in When the cookie meets the blockchain:Privacy risks of web payments via cryptocurrencies, Steven Goldfeder, Harry Kalodner, Dillon Reisman and Arvind Narayanan showed how difficult it was for users to make purchases on the Web using cryptocurrencies without sacrificing privacy:
We show how third-party web trackers can deanonymize users of cryptocurrencies. We present two distinct but complementary attacks. On most shopping websites, third-party trackers receive information about user purchases for purposes of advertising and analytics. We show that, if the user pays using a cryptocurrency, trackers typically possess enough information about the purchase to uniquely identify the transaction on the blockchain, link it to the user’s cookie, and further to the user’s real identity. Our second attack shows that if the tracker is able to link two purchases of the same user to the blockchain in this manner, it can identify the user’s entire cluster of addresses and transactions on the blockchain, even if the user employs blockchain anonymity techniques such as CoinJoin.
I'm sure both versions of PGP, the Bitcoin software in use for the Goldfeder et al study, and the Ethereum software in use during Béres et al's study had vulnerabilities. But none of the security lapses in these studies exploited any of them. The user interfaces of security-critical software must be designed so that the user cannot perform actions that impair the security or anonymity the infrastructure is designed to deliver.

For example, it is essential that a mixer such as Tornado Cash not preserve the connection between a deposit and the corresponding hash value in the Merkle tree. Thus it cannot know whether a user is withdrawing to the same address they used for the deposit. This check can only be performed by the user interface software, but Béres et al's study shows it isn't. Similarly, they show that anonymity requires randomized intervals between deposit and withdrawal, which again can only be implemented by the user interface software.

1 comment:

David. said...

Andy Greenberg's Feds Arrest an Alleged $336M Bitcoin-Laundering Kingpin is an object lesson in the unattainable level of opsec needed to remain pseudonymous in the world of cryptocurrencies:

"For a decade, Bitcoin Fog has offered to obscure the source and destination of its customers' cryptocurrency, making it one of the most venerable institutions in the dark web economy. Now the IRS says it has finally identified the Russian-Swedish administrator behind that long-running anonymizing system and charged him with laundering hundreds of millions of dollars worth of bitcoins, much of which was sent to or from dark web drug markets. What gave him away? The trail of his own decade-old digital transactions."

The details are fascinating; it is worth reading the who post.