Sunday, December 30, 2007

Mass-market scholarly communication revisited

Again, I need to apologize for the gap in posting. It turns out that I'm not good at combining travel and blogging, and I've been doing a lot of travel. One of my trips was to DC for CNI and the 3rd International Digital Curation Conference. One of the highlights was a fascinating talk (abstract) by Prof. Carole Goble of the University of Manchester's School of Computer Science. She's a great speaker, with a vivid turn of phrase, and you have to like a talk about science on the web in which a major example is VivaLaDiva.com, a shoe shopping site.

Carole and her team work on enabling in silico experiments by using workflows to compose Web and other services. Their myGrid site uses their Taverna workflow system to provide scientists with access to over 3000 services. Their myExperiment "scientists social network and Virtual Research Environment encourages community-wide sharing, and curation, of workflows".

Two things I found really interesting about Carole's talk were:

  • myExperiment is an implementation of the ideas I discussed in my post on Mass-Market Scholarly Communication, enhanced with the concepts of workflows.

  • The emerging world of web services is the big challenge facing digital preservation. Her talk was a wonderful illustration both of why this is an important problem, in that much of reader's experience of the Web is already mediated by services, and why the barriers to doing so are almost insurmountable.


Carole's talk was like a high-speed recapitulation of the history of the Web, with workflows taking the place of pages. More generally, it was an instance of the way Web 2.0 evolution is like Web 1.0 evolution with services instead of static content. Carole described how scientists discovered that they could link together services (pages) using workflows (links). There were soon so many services that sites that directories of them arose (think the early Yahoo!). Then there were so many of them that search engines arose. Then enough time elapsed that people started noticing that workflows (links) decayed over time quite rapidly. There was, however, one important piece of Web 1.0 missing from her presentation - advertising. Follow me below the jump for an explanation of why this omission is important and some suggestions about what can be done to remedy it.

Advertising has played a much underestimated role in driving the evolution of the Web. It provides direct, almost instantaneous, feedback that rewards successful mutations. Because the feedback is monetary, it ensures that successful mutations will get the resources they need to thrive, and unsuccessful ones will not. We take for granted that a Web site that attracts a large readership will be able to sustain itself and grow, but the only reason this is so is because of advertising. Daily Kos, a political blogging site, was started on a shoestring in 2002 and hit a million page views a month after its first year. It now gets 15 million a month, consumes a respectable-size server farm and employs a growing staff. Darwin would recognize this process instantly.

Despite a 1998 harangue from Tim Berners-Lee, research showed that pages had a half-life of a month or two, and that even links in academic journals decayed rapidly. But that was before advertising effects had been felt. Now, it may still be true that pages have a short half-life. But pages that the readership judges to be important, and which thus bring in advertising dollars, do not decay rapidly. Nor do the links that bring traffic to them. Site administrators have been schooled by advertising's reward and punishment system, and they know that gratutiously moving pages breaks links and impairs search engine rankings, which decreases income. So the problem of decaying links has been solved, not by persistent URL technology but by rewarding good behavior and punishing bad behavior.

Although the analogy between Web 1.0 pages and Web 2.0 workflows was apparent from Carole's talk, there is one big difference. People that advertisers will pay to reach read Web 1.0 pages. Workflows are read by programs, whose discretionary spending is zero. There's thus no effective mechanism rewarding good behavior in the workflow world and punishing bad. Suppose that putting a service on-line that attracted a large workflow-ership rapidly caused a flow of money to arrive at the site hosting the service, sufficent to sustain and grow it. Many of the problems currently plaguing workflows would vanish overnight. Site maintainers would find, for example, that a poor availability record or non-upwards-compatible changes to their sites API would rapidly reduce their income, and being the smart young scientists they are, would learn not to do these things.

Funding agencies and others interested in the progress of e-science need an equivalent of advertising to drive the evolution of services and workflows. Without it the field will continue to be plagued by poor performance, fragility, unreliability and instability, with much effort being wasted. More critically, one major key to scientific progress is the requirement that experiments be replicable by later researchers. The current workflow environment appears to provide an almost total inability to replicate experiments after a period no matter how well they were published.

What key aspects of Web advertising are needed in a system to drive the evolution of scientific workflows?

  • It must provide money directly to the maintainers of the services and workflows.

  • The amount of money must be based on automated, third-party measures of usage (think AdSense or Doubleclick or even SiteMeter for scientific workflows) and importance (think PageRank for scientific workflows).

  • The cycle time of the reward system must match the Web, not the research funding process. myExperiment has been in public beta less than six months. In that time it has evolved significantly. A feedback process that involves writing grant proposals, having them peer-reviewed, and processed through an annual budget cycle is far too slow to have any effect on the evolution of a workflow environment.

Funders need to put some infrastructure money into a pot that is doled out automatically via these measures. Doing so will pay great benefits in scientific productivity.

2 comments:

simonfj said...

Hmm,
I liked the comments
"So the problem of decaying links has been solved, not by persistent URL technology but by rewarding good behavior and ..." (we'll leave out 'punishing bad behaviour', as that doesn't happen in academia).

We'd better back up a bit first though, don't you think? What brings in the advertising? Answer; the quality and amount of eyeballs. And these days the eyeballs are interactive ones, so we're talking about, in the first instance, conversations; which aren't often seen to happen in academic communities due to their need to stay up with the latest media fashion = blogs.

As much as I like the 'aggregating workflow' idea, it's pretty remote from the reality of Web 2.0 services (as this community calls them), which are based on providing a specific service or tool. Like this blog, they are about giving people the means to create.

You asked, "What key aspects of Web advertising are needed in a system to drive the evolution of scientific workflows?" That's pretty easy, you have to aggregate the conversations of the members of a discipline or profession, in an environment which enables insiders to get on and not be interrupted too often, and for outsiders, when they ask a question (at set times) get an answer.

This is trying to get out of the woodwork of course. You only have to look at the eyeballs on Chris's forums. http://forum.dcc.ac.uk/viewforum.php?f=18
Not a great design, but an attempt.

Until the kind of comments from your blog appear on a thread and a conversation begins to be seen to take place, all we can do is hope that all those nice people being rewarded for outmoded habits, some of which are described on another (community member's?)blog
http://ksulib.typepad.com/conferences/2007/12/
digital-curat-2.html
won't go bats going round in circles.

Maybe it's too late.

andrew123 said...

You asked, "What key aspects of Web advertising are needed in a system to drive the evolution of scientific workflows?" That's pretty easy, you have to aggregate the conversations of the members of a discipline or profession, in an environment which enables insiders to get on and not be interrupted too often, and for outsiders, when they ask a question (at set times) get an answer.
======================================
Andrew William

MLS