I can’t think of a single large software company that doesn’t regularly draw internet comments of the form “What do all the employees do? I could build their product myself.” Benjamin Pollack and Jeff Atwood called out people who do that with Stack Overflow. But Stack Overflow is relatively obviously lean, so the general response is something like “oh, sure maybe Stack Overflow is lean, but FooCorp must really be bloated”. And since most people have relatively little visibility into FooCorp, for any given value of FooCorp, that sounds like a plausible statement. After all, what product could possible require hundreds, or even thousands of engineers?
A few years ago, in the wake of the rapgenius SEO controversy, a number of folks called for someone to write a better Google. Alex Clemmer responded that maybe building a better Google is a non-trivial problem. Considering how much of Google’s $500B market cap comes from search, and how much money has been spent by tens (hundreds?) of competitors in an attempt to capture some of that value, it seems plausible to me that search isn’t a trivial problem. But in the comments on Alex’s posts, multiple people respond and say that Lucene basically does the same thing Google does and that Lucene is poised to surpass Google’s capabilities in the next few years.
What would Lucene at Google’s size look like? If we do a naive back of the envelope calculation on what it would take to index a significant fraction of the internet (often estimated to be 1 trillion (T) or 10T documents), we might expect a 1T document index to cost something like $10B1. That’s not a feasible startup, so let’s say that instead of trying to index 1T documents, we want to maintain an artisanal search index of 1B documents. Then our cost comes down to $12M/yr. That’s not so bad – plenty of startups burn through more than that every year. While we’re in the VC-funded hypergrowth mode, that’s fine, but once we have a real business, we’ll want to consider trying to save money. At $12M/yr for the index, a 3% performance improvement that lets us trim our costs by 2% is worth $360k/yr. With those kinds of costs, it’s surely worth it to have at least one engineering working full-time on optimization, if not more.