Thursday, April 15, 2021

NFTs and Web Archiving

One of the earliest observations of the behavior of the Web at scale was "link rot". There were a lot of 404s, broken links. Research showed that the half-life of Web pages was alarmingly short. Even in 1996 this problem was obvious enough for Brewster Kahle to found the Internet Archive to address it. From the Wikipedia entry for Link Rot:
A 2003 study found that on the Web, about one link out of every 200 broke each week,[1] suggesting a half-life of 138 weeks. This rate was largely confirmed by a 2016–2017 study of links in Yahoo! Directory (which had stopped updating in 2014 after 21 years of development) that found the half-life of the directory's links to be two years.[2]
One might have thought that academic journals were a relatively stable part of the Web, but research showed that their references decayed too, just somewhat less rapidly. A 2013 study found a half-life of 9.3 years. See my 2015 post The Evanescent Web.

I expect you have noticed the latest outbreak of blockchain-enabled insanity, Non-Fungible Tokens (NFTs). Someone "paying $69M for a JPEG" or $560K for a New York Times column attracted a lot of attention. Follow me below the fold for the connection between NFTs, "link rot" and Web archiving.

Kahle's idea for addressing "link rot", which became the Wayback Machine, was to make a copy of the content at some URL, say:
keep the copy for posterity, and re-publish it at a URL like:
What is the difference between the two URLs? The original is controlled by Example.Com, Inc.; they can change or delete it on a whim. The copy is controlled by the Internet Archive, whose mission is to preserve it unchanged "for ever". The original is subject to "link rot", the second is, one hopes, not subject to "link rot". The Wayback Machine's URLs have three components:
  • locates the archival copy at the Internet Archive.
  • 19960615083712 indicates that the copy was made on 15th June, 1996 at 8:37:12.
  • is the URL from which the copy was made.
The fact that the archival copy is at a different URL from the original causes a set of problems that have bedevilled Web archiving. One is that, if the original goes away, all the links that pointed to it break, even though there may be an archival copy to which they could point to fulfill the intent of the link creator. Another is that, if the content at the original URL changes, the link will continue to resolve but the content it returns may no longer reflect the intent of the link creator, although there may be an archival copy that does. Even in the early days of the Web it was evident that Web pages changed and vanished at an alarming rate.

The point is that the meaning of a generic Web URL is "whatever content, or lack of content, you find at this location". That is why URL stands for Universal Resource Locator. Note the difference with URI, which stands for Universal Resource Identifier. Anyone can create a URL or URI linking to whatever content they choose, but doing so provides no rights in or control over the linked-to content.

In People's Expensive NFTs Keep Vanishing. This Is Why, Ben Munster reports that:
over the past few months, numerous individuals have complained about their NFTs going “missing,” “disappearing,” or becoming otherwise unavailable on social media. This despite the oft-repeated NFT sales pitch: that NFT artworks are logged immutably, and irreversibly, onto the Ethereum blockchain.
So NTFs have the same problem that Web pages do. Isn't the blockchain supposed to make things immortal and immutable?

Kyle Orland's Ars Technica’s non-fungible guide to NFTs provides an over-simplified explanation:
When NFT’s are used to represent digital files (like GIFs or videos), however, those files usually aren’t stored directly “on-chain” in the token itself. Doing so for any decently sized file could get prohibitively expensive, given the cost of replicating those files across every user on the chain. Instead, most NFTs store the actual content as a simple URI string in their metadata, pointing to an Internet address where the digital thing actually resides.
NFTs are just links to the content they represent, not the content itself. The Bitcoin blockchain actually does contain some images, such as this ASCII portrait of Len Sassaman and some pornographic images. But the blocks of the Bitcoin blockchain were originally limited to 1MB and are now effectively limited to around 2MB, enough space for small image files. What’s the Maximum Ethereum Block Size? explains:
Instead of a fixed limit, Ethereum block size is bound by how many units of gas can be spent per block. This limit is known as the block gas limit ... At the time of writing this, miners are currently accepting blocks with an average block gas limit of around 10,000,000 gas. Currently, the average Ethereum block size is anywhere between 20 to 30 kb in size.
That's a little out-of-date. Currently the block gas limit is around 12.5M gas per block and the average block is about 45KB. Nowhere near enough space for a $69M JPEG. The NFT for an artwork can only be a link. Most NFTs are ERC-721 tokens, providing the optional Metadata extension:
/// @title ERC-721 Non-Fungible Token Standard, optional metadata extension
/// @dev See
///  Note: the ERC-165 identifier for this interface is 0x5b5e139f.
interface ERC721Metadata /* is ERC721 */ {
    /// @notice A descriptive name for a collection of NFTs in this contract
    function name() external view returns (string _name);

    /// @notice An abbreviated name for NFTs in this contract
    function symbol() external view returns (string _symbol);

    /// @notice A distinct Uniform Resource Identifier (URI) for a given asset.
    /// @dev Throws if `_tokenId` is not a valid NFT. URIs are defined in RFC
    ///  3986. The URI may point to a JSON file that conforms to the "ERC721
    ///  Metadata JSON Schema".
    function tokenURI(uint256 _tokenId) external view returns (string);
The Metadata JSON Schema specifies an object with three string properties:
  • name: "Identifies the asset to which this NFT represents"
  • description: "Describes the asset to which this NFT represents"
  • image: "A URI pointing to a resource with mime type image/* representing the asset to which this NFT represents. Consider making any images at a width between 320 and 1080 pixels and aspect ratio between 1.91:1 and 4:5 inclusive."
Note that the JSON metadata is not in the Ethereum blockchain, it is only pointed to by the token on the chain. If the art-work is the "image", it is two links away from the blockchain. So, given the evanescent nature of Web links, the standard provides no guarantee that the metadata exists, or is unchanged from when the token was created. Even if it is, the standard provides no guarantee that the art-work exists or is unchanged from when the token is created.

Caveat emptor — Absent unspecified actions, the purchaser of an NFT is buying a supposedly immutable, non-fungible object that points to a URI pointing to another URI. In practice both are typically URLs. The token provides no assurance that either of these links resolves to content, or that the content they resolve to at any later time is what the purchaser believed at the time of purchase. There is no guarantee that the creator of the NFT had any copyright in, or other rights to, the content to which either of the links resolves at any particular time.

There are thus two issues to be resolved about the content of each of the NFT's links:
  • Does it exist? I.e. does it resolve to any content?
  • Is it valid? I.e. is the content to which it resolves unchanged from the time of purchase?
These are the same questions posed by the Holy Grail of Web archiving, persistent URLs.

Assuming existence for now, how can validity be assured? There have been a number of systems that address this problem by switching from naming files by their location, as URLs do, to naming files by their content by using the hash of the content as its name. The idea was the basis for Bram Cohen's highly successful BitTorrent — it doesn't matter where the data comes from provided its integrity is assured because the hash in the name matches the hash of the content.

The content-addressable file system most used for NFTs is the Interplanetary File System (IPFS). From its Wikipedia page:
As opposed to a centrally located server, IPFS is built around a decentralized system[5] of user-operators who hold a portion of the overall data, creating a resilient system of file storage and sharing. Any user in the network can serve a file by its content address, and other peers in the network can find and request that content from any node who has it using a distributed hash table (DHT). In contrast to BitTorrent, IPFS aims to create a single global network. This means that if Alice and Bob publish a block of data with the same hash, the peers downloading the content from Alice will exchange data with the ones downloading it from Bob.[6] IPFS aims to replace protocols used for static webpage delivery by using gateways which are accessible with HTTP.[7] Users may choose not to install an IPFS client on their device and instead use a public gateway.
If the purchaser gets both the NFT's metadata and the content to which it refers via IPFS URIs, they can be assured that the data is valid. What do these IPFS URIs look like? The (excellent) IPFS documentation explains:<CID>
# e.g
Browsers that support IPFS can redirect these requests to your local IPFS node, while those that don't can fetch the resource from the gateway.

You can swap out for your own http-to-ipfs gateway, but you are then obliged to keep that gateway running forever. If your gateway goes down, users with IPFS aware tools will still be able to fetch the content from the IPFS network as long as any node still hosts it, but for those without, the link will be broken. Don't do that.
Note the assumption here that the gateway will be running forever. Note also that only some browsers are capable of accessing IPFS content without using a gateway. Thus the gateway is a single point of failure, although the failure is not complete. In practice NFTs using IPFS URIs are dependent upon the continued existence of Protocol Labs, the organization behind IPFS. The URIs in the NFT metadata are actually URLs; they don't point to IPFS, but to a Web server that accesses IPFS.

Pointing to the NFT's metadata and content using IPFS URIs assures their validity but does it assure their existence? The IPFS documentation's section Persistence, permanence, and pinning explains:
Nodes on the IPFS network can automatically cache resources they download, and keep those resources available for other nodes. This system depends on nodes being willing and able to cache and share resources with the network. Storage is finite, so nodes need to clear out some of their previously cached resources to make room for new resources. This process is called garbage collection.

To ensure that data persists on IPFS, and is not deleted during garbage collection, data can be pinned to one or more IPFS nodes. Pinning gives you control over disk space and data retention. As such, you should use that control to pin any content you wish to keep on IPFS indefinitely.
To assure the existence of the NFT's metadata and content they must both be not just written to IPFS but also pinned to at least one IPFS node.
To ensure that your important data is retained, you may want to use a pinning service. These services run lots of IPFS nodes and allow users to pin data on those nodes for a fee. Some services offer free storage-allowance for new users. Pinning services are handy when:
  • You don't have a lot of disk space, but you want to ensure your data sticks around.
  • Your computer is a laptop, phone, or tablet that will have intermittent connectivity to the network. Still, you want to be able to access your data on IPFS from anywhere at any time, even when the device you added it from is offline.
  • You want a backup that ensures your data is always available from another computer on the network if you accidentally delete or garbage-collect your data on your own computer.
Thus to assure the existence of the NFT's metadata and content pinning must be rented from a pinning service, another single point of failure.

In summary, it is possible to take enough precautions and pay enough ongoing fees to be reasonably assured that your $69M NFT and its metadata and the JPEG it refers to will remain accessible. Whether in practice these precautions are taken is definitely not always the case. David Gerard reports:
But functionally, IPFS works the same way as BitTorrent with magnet links — if nobody bothers seeding your file, there’s no file there. Nifty Gateway turn out not to bother to seed literally the files they sold, a few weeks later. [Twitter; Twitter]
Anil Dash claims to have invented, with Kevin McCoy, the concept of NFTs referencing Web URLs in 2014. He writes in his must-read NFTs Weren’t Supposed to End Like This:
Seven years later, all of today’s popular NFT platforms still use the same shortcut. This means that when someone buys an NFT, they’re not buying the actual digital artwork; they’re buying a link to it. And worse, they’re buying a link that, in many cases, lives on the website of a new start-up that’s likely to fail within a few years. Decades from now, how will anyone verify whether the linked artwork is the original?

All common NFT platforms today share some of these weaknesses. They still depend on one company staying in business to verify your art. They still depend on the old-fashioned pre-blockchain internet, where an artwork would suddenly vanish if someone forgot to renew a domain name. “Right now NFTs are built on an absolute house of cards constructed by the people selling them,” the software engineer Jonty Wareing recently wrote on Twitter.
My only disagreement with Dash is that, as someone who worked on archiving the "old-fashioned pre-blockchain internet" for two decades, I don't believe that there is a new-fangled post-blockchain Internet that makes the problems go away. And neither does David Gerard:
The pictures for NFTs are often stored on the Interplanetary File System, or IPFS. Blockchain promoters talk like IPFS is some sort of bulletproof cloud storage that works by magic and unicorns.
Update 22nd May 2021: At least one artist understands NTFs. Tip of the hat to David Gerard.

Update 28th May, 2021: Kimberly Parker's Most artists are not making money off NFTs and here are some graphs to prove it has some numbers on the NFT market during the mania which illustrate the extreme Gini coefficients endemic to cryptocurrencies:
The first time you sell an NFT it’s called a Primary Sale. Everything after that is called a Secondary Sale. According to this, 67.6% of Sales have not had a Secondary Sale and 19.5% have had one Secondary Sale. ...
  • 33.6% of Primary Sales were $100 or less
  • 20.0% of Primary Sales were $100-$200
  • 11.1% of Primary Sales were $200-$300
  • 7.7% of Primary Sales were $300-$400
  • 3.9% of Primary Sales were $400–500
  • 3.3% of Primary Sales were $500-$600
  • 2.5% of Primary Sales were $600–$700
Most NFT sites will recommend you set your sale price at 0.5 ETH, which was about $894 USD on March 19th. The number of Primary Sales that ended up selling for the recommended price was a whopping 1.8%.

The largest group of Primary Sales (34%) were for $100 or less. For $100, you can expect to have 72.5% — 157.5% of your Sale deducted by fees*. That’s an average(!) of 100.5%, leaving you with a $0.50 deficit or more.

The next biggest group of Primary Sales (20%) were for $100-$200. For $200, you can expect to have 37.5–80% of your Sale deducted by fees*. That’s an average(!) of 54%, leaving you with $92 or less.

The next biggest group of Primary Sales (11%) were for $200-$300. For $300, you can expect to have 25.8% — 54.2% of your Sale deducted by fees*. That’s an average(!) of 38.5%, leaving you with $223 or less.


David. said...

Kal Rustiala & Christopher Jon Sprigman's The One Redeeming Quality of NFTs Might Not Even Exist explains:

"once you understand what the NFT is and how it actually works, you can see that it does nothing to permit the buyer, as the New Yorker put it, to own a “digital Beanie Baby” with only one existing copy. In fact, the NFT may make the authenticity question even more difficult to resolve."

They quote David Hockney agreeing with David Gerard:

"On an art podcast, Hockney recently said, “What is it that they’re owning? I don’t really know.” NFTs, Hockney said, are the domain of “international crooks and swindlers.”

Hockney may have a point. If you look at them closely, NFTs do almost nothing to guarantee authenticity. In fact, for reasons we’ll explain, NFTs may actually make the problem of authenticity in digital art worse."

David. said...

Who could have predicted counterfeit NFTs? Tim Schneider's The Gray Market: How a Brazen Hack of That $69 Million Beeple Revealed the True Vulnerability of the NFT Market (and Other Insights) reports that:

"In the opening days of April, an artist operating under the pseudonym Monsieur Personne (“Mr. Nobody”) tried to short-circuit the NFT hype machine by unleashing “sleepminting,” a process that complicates, if not corrodes, one of the value propositions underlying non-fungible tokens.
Sleepminting enables him to mint NFTs for, and to, the crypto wallets of other artists, then transfer ownership back to himself without their consent or knowing participation. Nevertheless, each of these transactions appears as legitimate on the blockchain record as if the unwitting artist had initiated them on their own, opening up the prospect of sophisticated fraud on a mass scale."

And it is arguably legal because NFTs are just a (pair of) links:

"Personne told me that, after being “thoroughly consulted and advised by personal lawyers and specialist law firms,” he is confident there are “little to no legal repercussions for sleepminting.” His argument is that ERC721 smart contracts only contain a link pointing to a JSON (Javascript Object Notation) file, which in turn points to a “publicly available and hosted digital asset file”—here, Beeple’s Everydays image."

David. said...

Who could have predicted that NFTs would be the roach motel of money? Jon Sarlin reports that NBA Top Shot customers can't get their money out. Experts are confounded:

"NBA Top Shot is the hottest NFT marketplace on the planet. It's also got a big problem: Customers are complaining about exceptionally long wait times to get paid from sales of digital tokens that can often cost hundreds of thousands of dollars.

NBA Top Shot is the online marketplace where investors can spend hundreds to hundreds of thousands of dollars on unique, digitized moments from pro basketball. Its astonishing growth was fueled by being in the right place at the right time: having licensed the NBA brand amid a sudden mainstream interest in NFTs, or non-fungible tokens."

David. said...

In Cryptocurrency in 2021: still dysfunctional nonsense, unusable by normal humans, David Gerard recounts the "user experience" of a normal person wanting to give someone an NFT as a gag gift. They:

"bought some ether ... they put it into Metamask (the most popular Ethereum wallet), and then followed a guide on how to buy an NFT. Their transaction fee was low enough that the transaction took three days to go through — but at least it did eventually go through, and they secured ownership of their NFT.

The NFT and the excess ether are sitting there in their Metamask. Moving the ether out would be another expensive transaction — today’s average fee is $20.71, it was $60 a couple of days ago ... so they’re just going to give the recipient the Metamask wallet, private keys and all. With some ether that’s too small a quantity to move out of the wallet.

If a normal person trying to use Ethereum guesses wrong about today’s transaction fee, the transaction fails and their fee literally goes up in smoke. If your Ethereum transaction fails, then the ETH or the NFT doesn’t move — but you still lose the fee. Because it’s a computation that failed.

So try again! But guess a bit higher or something."

David. said...

Kimberly Parker's Most artists are not making money off NFTs and here are some graphs to prove it is a fascinating exercise in the way averages and extreme Gini coefficients interact:

"what is the median sale price of an NFT, and how does that compare to the average being displayed on these sites? Selling NFTs has been charitably described by some as like gambling. Surely someone could take the sales information from these sites and calculate a median, and that would help an artist have a much better idea of how much money they were likely to make."

The graphs are amazingly skewed, so much so that:

"These numbers do not show the democratization of wealth thanks to a technological revolution. They show an acutely minuscule number of artists making a vast amount of wealth off a small number of sales while the majority of artists are being sold a dream of immense profit that is horrifically exaggerated. Hiding this information is manipulative, predatory, and harmful, and these NFT sites have a responsibility to surface all this information transparently. Not a single one has."

I'm shocked, shocked to find manipulation and deception going on in cryptocurrency markets.

David. said...

Cristina Criddle, a technology reporter for the BBC, tried to purchase a Cryptokitties NFT for £13. It took three days and more than £30.

David. said...

Immutability FTW! Tim de Chant reports that The Tim Berners-Lee NFT that sold for $5.4M might have an HTML error:

"Two weeks ago, World Wide Web creator Tim Berners-Lee sent an NFT of the web’s original source code to the auction block with a starting bid of just $1,000. Yesterday, Sotheby’s announced that the crypto asset sold for $5.4 million. The sum makes Berners-Lee’s work one of the priciest NFTs of all time.

The digital package included not just the source code but also a letter from Berners-Lee reflecting on the creation of the web, some original HTML documents, an SVG “poster” of thousands of lines of code, and a 30-minute visualization of the code being typed on a screen."

@mikko tweeted:

"Hold on...the www source that Sotheby is auctioning? The angle brackets are wrong! They've been - yes - HTML encoded from "< >" to "< >". Lol."

That's OK - we all make mistakes. But then:

"The code was corrected in later animations, raising questions about this particular NFT and NFTs as a whole. It’s unclear whether the video posted on the listing page for the auction was pulled directly from the animation originally included in the NFT, and we’ve reached out to Sotheby’s for clarification. But if it was, and if the later, corrected animation reflects what was actually sold, it could mean that the original NFT was scrapped and a new one was created."

David. said...

The last few bullets of Amy Castor's Notes on NFTs, the high-art trade, and money laundering are:

* In March, the Financial Action Task Force, a Paris-based AML watchdog, issued a draft updated virtual asset guidance, which could have implications for NFTs.
* In its draft, the FATF doesn’t specifically call out NFTs, but it replaces an earlier phrasing of “assets that are fungible” with “assets that are convertible and interchangeable” in describing the kinds of virtual assets that need regulation. (NFTs are convertible when you sell them for other forms of crypto.)
* This subtle change in language directly targets NFTs (and DeFi as well).
* If the US adopts the final guidance — which it most likely will — those subtle changes in wording give FinCEN the authority to regulate not only existing virtual currencies but also emerging asset classes such as NFTs.
* Additionally, NFTs could be considered art and NFT marketplaces could be considered art auction houses and get included in new BSA laws.
* Like high-art, NFTs hit all the right targets for money laundering.

David. said...

Matthew Gault and Jason Koebler report on "link rot" in A Defunct Video Hosting Site Is Flooding Normal Websites With Hardcore Porn:

"As pointed out by Twitter user @dox_gay, hardcore porn is now embedded on the pages of the Huffington Post, New York magazine, The Washington Post, and a host of other websites. This is because a porn site called 5 Star Porn HD bought the domain for Vidme, a brief YouTube competitor founded in 2014 and shuttered in 2017.
Seemingly any embeds now redirect to the 5 Star Porn HD homepage. The site also redirects there. For example, if you check out this New York magazine article about former House Majority leader John Boehner's "creepy kissy face," you will see photos of Boehner but also images of a man with a gigantic penis fucking a woman."

David. said...

The fact that NFTs are just a link, and that the target of the link is malleable, leads to scams like the one reported by Joe Tidy in Fake Banksy NFT sold through artist's website for £244k:

"A hacker has returned $336,000 to a British collector after he tricked him into buying a fake Banksy NFT advertised through the artist's official website.

A link to an online auction for the NFT was posted on a now-deleted page of

The auction ended early after the man offered 90% more than rival bidders.

Banksy's team told the BBC "any Banksy NFT auctions are not affiliated with the artist in any shape or form"."

Al said...

This article was fascinating.

Wanted to enquire, a lot has changed in the last 2 years - especially with this NFT's and Crypto Crashing.

Are your views still similar? I saw a whole lot of people saying their NFT's weren't 'IPFS', did they fix this issue?

Is there any case to be made that some of the promises could be delivered in the near future now the ridiculousness is over?

David. said...

Goblintown NFT images all changed to an illustrated middle finger in protest about royalties by Molly White starts with some basic economics:

"There has been an ongoing controversy in the NFT world over creator royalties. Although NFTs are often talked up as being good for artists because they enable royalties to be paid even after the initial sale, these payments are rarely enforced by the smart contract and are instead up to marketplaces to enforce. In the last six months or so, NFT marketplaces have emerged that follow a "royalty optional" model, sparking a race to the bottom where OpenSea and other incumbents have also cut royalty protections to remain competitive."

And follows with some basic ignorance:

"Some people were horrified by the fact that NFTs that they owned could be changed after the fact without their consent, a fact they were not previously aware of. One owner wrote, "So your telling me I spent $1,000s of dollars and have 10 goblintowns for them all to now be dudes shaking their weiners?"