Thursday, February 1, 2024

The Stanford Digital Library Project

The Stanford Digital Library Project stated its goal thus:
The Stanford Integrated Digital Library Project will develop enabling technologies for an integrated “virtual” library to provide an array of new services and uniform access to networked information collections. The Integrated Digital Library will create a shared environment linking everything from personal information collections, to collections of conventional libraries, to large data collections shared by scientists.
Stanford librarians Vicky Reich and Rebecca Wesley provided the "library" input for the research.

Wayback Machine, 11/11/98
In particular Vicky explained citation indices, the concept behind Page Rank, to Larry Page and Sergey Brin. Andy Bechtolsheim was famously instrumental in persuading them to turn their demo of a Page Rank search engine into Google, the company. In his fascinating interview in the Computer History Museum's oral history collection, Andy explains why the idea of ranking pages by their inbound links was so important.

Below the fold I have taken the liberty of transcribing and cleaning up the relevant section of Andy's stream of conciousness, both because it is important history and because it exactly reflects the Andy I was privileged to know in the early days of Sun Microsystems.

This rough transcript runs from [47:47] to [53:35]. Andy speaks:

The true story is that I met the founders of Google before the company existed because they were, like I was, a student at Stanford. I'm not sure Larry was really ready to abandon the PhD program and jump into this. So I was in this same position once myself and I used the same line "you'll always finish your PhD later". Now the concern I had is that if they didn't get going this great idea may be not happening.

Of course I was involved in the company itself but it was really really a very good idea, one of the best ideas I've ever seen. This notion of relevant search and relevant ads and this business model that solved it Put it this way, I was very familiar with scientific publishing where what matters is not how many papers you write but how many people cite your papers. So if you apply the same thing to the Web clearly what is relevant is what other people link to and notice. You could automatically build a graph, a structure that said what's more important than others.

At the time people didn't think automatic search was actually possible because Alta Vista which was popular at the time just looked at keywords. What people would do is they would add the whole dictionary as a dark page behind the document. Since every word you are looking for is in the dictionary you couldn't find anything any more because every document had every word. That wasn't a path to success; you couldn't actually look at the document because it could just be spam.

Yahoo believed fairly strongly, they actually had at one point talked to Larry and Sergey and maybe even Larry was offering them to sell them the idea but Yahoo passed on the belief that you just couldn't do it. They really thought you could hire people, like newspaper editors, they would make the sports section and the garden section like a newspaper; select content from the Web that then they would present to the front page. Clearly that was not going to scale if there were millions of Web pages you just couldn't keep it up.

So Larry and Sergey believed very strongly that it had to be automatic and that if that wasn't possible — it was the only way to do it basically. That first demo which they had on a laptop was actually quite compelling it looked the same as before — here's the lucky button. The only worry was they wanted to sort of demo they could scale the search engine to like a couple of racks of computers before they would raise the real venture and there was some question whether it was scalable. For that they just wanted to raise a small amount of money to build the first couple of racks with motherboards.

The money that I put in there on day one was actually before the company started helped to demonstrate that. So I bought them this check. They had the name of the company but the check was to the name of the company that didn't exist. At the time some of the law firms were so busy they didn't want to take on new clients that didn't have some funding behind them. I figured if I write them a check it would help them to get the right firm. But I can't claim any credit for what they've done its all due to the insight they had and the team they built.

Let me back up here. I couldn't find stuff on Alta Vista so I was desperate for better search. A lot of my time was looking for data sheets and information I was looking for and if I couldn't find anything on the Web the Web wouldn't be very useful. Like, how do you find stuff? Its the most important thing. So for me it was a personal goal to have something that would actually work.

But in any new company the first question is "what's your business model?" Even at the time they had this model of sponsored links that would take your search query and link it to this ad inventory. I asked them "how much is it per click?" and they said 5 cents per click which is still their bottom price today - this is before they got into the competitive bidding - and I did this math in the back of my mind "a million clicks a day and 5 cents a click is $50K a day" — they can't go broke.

I had no idea how this would scale, in effect I don't think anybody understood this but it was clear that there was enough interest in finding the people who were looking for stuff. Lets say you search for a tennis racket. That means you're probably a tennis player and most likely there will be an ad that shows you tennis rackets or tennis balls or something that relates to your interest. And the key was you have an unlimited ad inventory, instead of having these banner ads that I have never clicked on in my life, maybe once.

Banner ads are a waste of bandwidth essentially. These ads were highly more relevant and even today except for spam mail its the most cost-effective way of advertising or finding customers. It took advantage of the fact that the Internet is a full-duplex communication path whereas banner ads were more like TV — here's your ad break and here's what you have to consume before you get to the next page.

Google was an absolutely brilliant idea but funnily enough their business really took off after the dot-com implosion in 2001. If you look at their historical revenue what happened at that point I believe was that people realized that they spent on banner ads was basically wasted and they got back to an ROI calculation of where do we apply money more cost-effectively and, yeah, so Google. Once people started bidding for the keywords of course the price per click went up but it still provides very very good value for advertisers.


David. said...

Vicky Reich writes:

"One afternoon, I told the team (including Larry Page and Sergey Brin) about HighWire Press providing readers “toll free access” to cited journal articles. And, what turned out to be more important, the detailed workings of the Science Citation Index.

After a bit, Sergey and Larry presented Backrub to the SIDLP team, Backrub was the Google precursor."

David. said...

In The Anatomy of a Large-Scale Hypertextual Web Search Engine Brin and Page explain Page Rank thus:

"Academic citation literature has been applied to the web, largely by counting citations or backlinks to a given page. This gives some approximation of a page’s importance or quality. PageRank extends this idea by not counting links from all pages equally, and by normalizing by the number of links on a page."