February 21, 2005

Don't be evil  

Mithras clued me in Saturday to the big change in Google's ranking algorithm last week. It's still not clear exactly what the change was, but the result is that most blogs have experienced dramatic falls in their Google search placement. These falls seem to be independent of Google page rank and are affecting some blogs differently than others. In my case, this blog is no longer the first hit for "locussolus" or "paul goyette" -- enter the latter right now and you won't find this page in the top 100 results. I find this irksome, but it's not really that big a deal; the people who read this site usually didn't find it through a Google search, and the people who do find it through a Google search were probably looking for something else. (Of course, this is not the case for all blogs.)

The comments and trackbacks to Mithras's post contain a number of hypotheses about the possible mechanics and rationales behind this change, but it's clearly related to Google's attempts to thwart comment spammers. It may also be have something to do with the new nofollow attribute, but that can't be the only explanation since not all sites have been affected equally (in particular, many self-hosted sites don't seem to have been affected at all). At any rate, I'm still trying to come to grips with the consequences and implications of this change; I'm going to try to work through them here.

Search engines, and in particular Google, are really becoming the gatekeepers for the world's information. Not all information can be found through Google, but things are definitely moving in that direction -- the decisions to encode major libraries in the United States (and by the way the rest of the world won't be left out for long) and to make academic papers available through Google Scholar are part of a grand and somewhat utopian scheme to make all the world's information available to everyone -- or at least, all the information that's not under copyright!

But Google's growing status as gatekeeper for the world's information means that it makes crucial decisions about access. Not all information is equally important or equally relevant to a particular query, so information is ranked. This ranking may just be a matter of convenience when you're teaking your algorithm in the early '90s, but when literally everything is online and the internet is the medium for public discourse, it becomes a question of power, influence, and quiet marginalization. Surveying the situation today, I have no doubt that this is where we're headed.

What's so magical about the folks at Google is that even when they were tweaking their algorithm back in the early '90s, they foresaw the potential for these deep issues of speech and access. So instead of relying exclusively on content analysis, they built their model to incorporate the implicit views of the internet's readers and writers: they counted links, and they used their count to estimate a given site's authority on a particular search term, or in general. This was a profound and elegant achivement. Yes, it made search more accurate. But more than that, it codified the web's already democratic ethos -- tying search results to the actions of writers demystified search and gave content creators more power in the form of links. And while Google's new algorithm was somewhat prone to manipulation -- that's what's precipitated this whole crisis -- the very reason it could be manipulated was that it was transparent.

Today we have the blog, a phenomenon that's emerged largely because of the authority given to links and Google's transparency with respect to that authority. It's a phenomenon that takes that democratic ethos to the next level by removing virtually all the costs (financial, but also in terms of the required technical knowledge) associated with self-publication. Mix in Google's method of ranking search results, and you have a situation where millions of people have been moved to new acts of speech and are engaged in a worldwide discourse. If you hold freedom of expression dear, this is a monumental achievement.

Of course, there are also the spammers, who take the same democratizing elegance of Google's system and turn it on its head: by flooding my comment section (and yours) with links, they're able to increase their (or their client's) page rank, which means more traffic and presumably more sales. Google has lately attempted to solve this problem by removing some links from its search results -- so for instance blogs hosted with Blogger now use a nofollow tag to exclude links in comments from Google's ranking calculations, and MT has made a plugin available.

And it makes sense that Google would try to effect some change centrally rather than counting on individual bloggers to spam-proof their sites. For the bloggers, this is a classic collective action problem: if all bloggers took the same anti-spam actions, nobody would have to deal with spam; but an individual blogger's actions don't have much effect on the overall comment spam situation on the internet, so there's little incentive for the individual to act, except perhaps to get rid of local spam. Unfortunately, this isn't enough incentive for every blogger, and adding plugins or other anti-spam software can require technical skills that not everyone has, or has the will to acquire. Google, on the other hand, has an enormous incentive to act unilaterally. Their search engine, built to reflect individuals' search needs, loses some of its usefulness if it can be polluted by comment spam. The concern over this growing problem had to be one of the major considerations in their acquisition of Blogger, and it's almost certainly behind this latest change in their search algorithm. Deemphasizing blogs in their ranking calculation means deemphasizing all of that comment spam as well.

In various comments and responses I've read, there are people who view this adjustment as a good thing. Some seem to think that assigning authority to all those links has had a corrupting effect on blogs, even though blogs would never have built the powerful network between them without them. Others hope it will reduce the spam problem, and they might be right -- although past experience suggests spammers will find a way to exploit any system that gives individual users any discretion at all. Still others are glad that search results will be easiler to read without all those irrelevant blog posts, which is reasonable enough as long as you don't care about the content on blogs.

But even beyond these practical considerations (which at any rate I find uncompelling), there are important questions of value that have to be dealt with. In this case the issue is one of form -- the blog is being subordinated in search results to other forms, such as the static informational website or the online store. Does this formal subordination reflect a judgment about the value of the content on blogs? If so, it's an extraordinarily crude generalization; while many blogs may not contain information of value to others, there are also plenty of blogs that do. Are the recipes at Too Many Chefs less valuable than those at David's Yum Yums just because one is a blog and the other is not? And even if it's not an explicit judgment about blog content, the judgment is implicit in the adjustments Google has made to its algorithm, and strikingly so. Regardless of the reason, individual writers (citizen journalists!) are being written out of equation and power is being taken from them.

What's really unnerving about this seizure is that Google and the other major search players are all corporate entities. This means that their incentive structure is (naturally) about finding ways to make a profit. So Google may care about the accuracy of its search algorithm, but only because it indirectly supports the goal of selling more ads to generate revenue, which is necessarily Google's overriding concern. From an economic perspective, there are really two problems with letting the market handle search. First, major search players have little incentive to safeguard public speech or treat it the same as corporate speech. Competition won't lead to provision or protection of public speech because it's simply not in the interests of corporate entities. The other problem is that there are insurmountable barriers to entry in search. The major players all crawl and cache huge portions of the net, but for the would-be alternative search engine, this is a serious hardware challenge. There's a reason that we don't see serious challenges to Google's dominance very often (and that when we do, it's from Microsoft).

When Google changes its algorithm to diminish the authority of blogs, we're left with other kinds of authorities -- some traditional media, some educational institutions and non-profits, but mostly businesses. That Google's advertising market consists primarily of these same businesses is obviously a conflict -- not a legal conflict, of course, since there's no regulation on this point, but rather a conflict of value systems. The central issue of this whole discussion is what it means for content to have value. In a capitalist setting we value things in terms of dollars, and even when there are other kinds of values involved we can usually find a way (through the magic of revealed preference) to convert them into dollar terms. But there are other important values that are difficult to quantify in this way. We live in a free market society. Well, in order for markets to function efficiently, there has to be a strong flow of information among consumers and firms. But this can't happen when a single corporation (or even a group of major search players) controls both the information and its presentation, and can strategically change both in respose to its own incentives. We live in a democracy. Well, in order for a democracy to function properly, individuals have to have as complete a picture as possible of their government and the state of the world. This can't happen when the corporation controlling the information is willing to marginalize an entire class of speech because its own business model is threatened!

There's no strong conclusion to be made here. I don't think that Google is evil yet, and there's every reason to think they're still in the process of solving the spam problem, and that their solution will be more subtle than what we've seen so far. But that doesn't change the fact that these institutions (eg the entire store of human knowledge, and how it's organized) need to be safeguarded by someone who has some incentive to actually do some safeguarding. Certainly there will be plenty more incidents like this in the future to remind us how much power Google et al wield and for what ends; but in the meantime aren't there some basic steps we can take? The most obvious would be a demand (made by the public or the government) that all search algorithms be entirely transparent. This would obviously be a blow to the search business, but it might just be a reasonable price to pay to be the gatekeeper for all human knowledge. And as I mentioned before, the main barrier to entry in search is probably the cost of hardware rather than the algorithm; also it might be possible for the algorithm to remain proprietary even if it's public (although the lack of good international protections makes this a little thorny).

We should probably also be thinking about specific content rights concerning speech and access. Some system of minimal fairness requirements for how content is to be sorted by search engines and how this relates to both the reader and the writer of content would seem to be in order, even if they're non-binding. Some guidelines on this -- even just a more formal and gracious version of the terms of service that are already out there -- could do a lot to reassure the public, even absent the total transparency called for above.

Pamela  {February 21, 2005}

Your write that your blog has dropped in rank when you search for it on Google. Well, Google doesn't list my blog yet. What is your thinking on why this is?

paul  {February 21, 2005}

I guess this might have something to do with the algorithm change, but since your blog is only a few days old, I'm guessing it has more to do with the fact that you either haven't been spidered yet or don't have enough inbound links to register. (I would recommend leaving your site address here when you leave comments, except that I have the nofollow plugin installed.)

Caleb  {February 22, 2005}

Hm ... To go along with your point about Google's own corporate interests being bound up in the algorithm, it appears that Blogger sites (owned by Google) still appear fairly high in search results. I don't know if this is by design, though.

barrett  {February 22, 2005}

The best thing you can do to get your site recognized and spidered is to get it listed at dmoz.org. Google uses those listings as part of their starting point to find content.

Very nice post, Paul. I knew this was buggin you and that you'd put some thought into it. it's nice to see it "on paper".

ulteriorepicure  {February 22, 2005}

okay... i have absolutely no idea what this is about... but thought i'd just say hello while i'm here - got your link off of the comments on heidi's blog.

come visit sometime.


paul  {February 23, 2005}

Caleb -- that's interesting, because many Blogger blogs actually have reported a major change. I'm not sure why Google would especially want to promote Blogger content, since there are no longer advertisements on those pages.

Post a comment

Remember personal