Tuesday, August 30, 2016

Fighting the Web Flab

Source: Frederic Filloux
At Monday Note, Frederic Filloux's Bloated HTML, the best and the worse starts where I've started several times, with the incredibly low density of actual content in today's Web:
When reading this 800 words Guardian story — about half of page of text long — your web browser loads the equivalent of 55 pages of HTML code, almost half a million characters. To be precise: an article of 757 words (4667 characters and spaces), requires 485,527 characters of code ... “useful” text (the human-readable article) weighs less than one percent (0.96%) of the underlying browser code. The rest consists of links (more than 600) and scripts of all types (120 references), related to trackers, advertising objects, analytics, etc.
But he ends on a somewhat less despairing note. Follow me below the fold for a faint ray of hope.

Filloux continues:
In due fairness, this cataract of code loads very fast on a normal connection.
His "normal" connection must be much faster than my home's 3Mbit/s DSL. But then the hope kicks in:
The Guardian technical team was also the first one to devise a solid implementation of Google's new Accelerated Mobile Page (AMP) format. In doing so, it eliminated more than 80% of the original code, making it blazingly fast on a mobile device.
Great, but AMP is still 20 bytes of crud for each byte of content. What's the word for 20 times faster than "blazingly"? The Accelerated Mobile Page project has three components. First, some JavaScript that:
implements all of AMP's best performance practices, manages resource loading and gives you [custom tags], all to ensure a fast rendering of your page. Among the biggest optimizations is the fact that it makes everything that comes from external resources asynchronous, so nothing in the page can block anything from rendering. Other performance techniques include the sandboxing of all iframes, the pre-calculation of the layout of every element on page before resources are loaded and the disabling of slow CSS selectors.
Among the things the JavaScript implements are extended HTML tags that can be used to improve performance:
some HTML tags are replaced with AMP-specific tags (see also HTML Tags in the AMP spec). These custom elements, called AMP HTML components, make common patterns easy to implement in a performant way.
Finally, Google supports the use of AMP with a proxy cache that:
fetches AMP HTML pages, caches them, and improves page performance automatically. When using the Google AMP Cache, the document, all JS files and all images load from the same origin that is using HTTP 2.0 for maximum efficiency.
The cache also validates the pages it caches confirming that:
the page is guaranteed to work, and that it doesn't depend on external resources. The validation system runs a series of assertions confirming the page’s markup meets the AMP HTML specification.
Source: Frederic Filloux
Filloux then reports some interesting comparisons:
As an admittedly biased reference point, I took one of the first texts, World Wide Web Summary, written in HMTL by its inventor Tim Berners-Lee. Published in 1991, it probably is one of the purest, most barebones forms of hypertext markup language: less that 4200 characters of readable text for less that 4600 characters of code. That’s a 90% usefulness rate as shown in the table below (you can also refer to my original Google Sheet here, to get precise numbers, stories URLs and formulae).
The table (click on the image above) is interesting for the wide range of "usefulness rate", from 91% to 1%:
The big surprise (at least for me) comes from the Progressive Web App implemented by the Washington Post. The Plain HTML page offers roughly the same content as the PWA version, but with a huge gain in HTML size.
The Washington Post PWA page uses less than one-tenth as many bytes to deliver equivalent content. That's double the improvement The Guardian got with AMP. Progressive Web Apps are a technique created by Google about a year ago for building Web pages that, by using local storage, service workers and asynchronous behavior to provide app-like user experiences:
Google is just starting to promote the PWA on a large scale and the tools are already available. ... Because it supports Push notifications and other features until now reserved to native apps, PWA has great potential for publishers
It is clearly capable of impressive performance gains, with only about 1 byte of crud for each byte of content. Filloux's equivocal about the prospects for AMP and PWA. Although Google has ways of punishing sites that don't get with the program, I'm more pessimistic. The tools people use to generate their pages emit HTML that is just brain-dead (e.g. the same enormous <div> specification on adjacent phrases). Only people who simply don't care could put out stuff like this.

[Updated to correct ungrammatical sentence]

No comments: