Monday, February 18, 2008

Who writes Wikipedia?

Who writes Wikipedia? Considering that Wikipedia is the world's biggest and most widely used encyclopedia, and that it ranks consistently among the top ten sites on Alexa rankings, this is certainly a pertinent question to consider. Yet, I haven't found a lot of relevant research, or even substantiated guesswork, on who actually does write Wikipedia. So I'll add here my pool of guesswork, and also give my views on the questions of systemic bias in Wikipedia.

If you watch a video by Jimmy Wales on Wikipedia, you'll notice that there are roughly two things he says. The first is that Wikipedia benefits a lot by getting anybody to edit. In other words, it is the mass of contributions, often of varying degrees of quality and quantity, that has made Wikipedia what it is today. Second, Wales points out that the masses in themselves wouldn't have produced a great encyclopedia. The real success is due to a few hundred people who are at the helm, who are in constant touch with each other, who are reverting vandalism, wikifying the place, tidying up after the masses. In response to these, we have Aaron Swartz's blog post where he describes his theory: that bulk of the Wikipedia content is written by the fringe users, or the tail-end (as I call it) and the big shots do the little tweakings here and there.

Taking the common denominator between both these views gives me the idea that Wikipedia has the following four kinds of editors to an article:


  1. Fringe editors: These are casual readers who correct a spelling here, correct a bit of grammar there, a bit of structure there. They're doing copy-editing.

  2. Article-involved editors: These are people who have a specific involvement in the content of the article.

  3. Wikipedia-involved editors: These are people who are involved in Wikipedia per se, so their interest in editing the article comes from trying to make Wikipedia a better place.



These may not be absolute partitions, so, for instance, I may be editing an article partly because of my vested interest in its content, and partly because of my desire to make Wikipedia a better place. But I think vested interest in content is a fairly strong motivating factor to the long tail-end of Wikipedia contributions. Do I have statistical evidence for this? Not much, but I can talk of myself. All the Wikipedia articles I have written, ranging from Chennai Mathematical Institute to IMOTC to contranormal subgroup, were on topics that I had vested interests in. I've rarely done significant editing to articles on topics I have no thoughts, opinions, or biases on. Is the same true of others? I don't know a systematic and effortless procedure for collecting data, but a quick look at articles on some leading universities seems to indicate that most of the edits are done by students of the university.

The interesting thing about this is that vested-interest editing, which probably forms the long tail-end, is both tacitly encouraged and shoved under the carpet. People are encouraged to write articles on topics they consider important, which is pretty close to encouraging vested-interest editing, and yet the overt theme of Wikipedia is that people are working to build a better encyclopedia. This, at least, wasn't my aim when editing the articles on Chennai Mathematical Institute or IMOTC. My idea was simply to have a reliable information source on these that lent credibility to the respective institutes. The aim wasn't to improve Wikipedia but to leverage it.

Does vested-interest editing contradict Wikipedia's neutral point of view policy? Not directly. However, the fact that people editing an article are themselves largely a self-selected group of vested interests, should be factored in for the implementation of the neutral point of view policy. First, if there are vested interests, the vested interests should also fully understand Wikipedia's policy and agree to abide by it. Second, there should be a uniform representation of different kinds of vested interests. Third, the Wikipedians who're trying to tidy up, shouldn't have any vested interests themselves.

It is fascinating, because I have noticed a lot of bias and vested interests on Wikipedia, often hidden under the carpet by saying that anybody is free to participate. How does the systemic bias arise?

Let's first look at biographies and articles about people. Since Wikipedia is a place for no original research, that means that a Wikipedia article on a person isn't going to be based on firsthand interviews and information gathering about the person. Rather, it is going to be a collation of data and facts that have already been established in other sources. Nonetheless, there is a lot of selectivity and subjectivity as to what exactly goes into the article about a person, or a company. Again, I'm not aware of any research that has systematically looked at Wikipedia biographies.

I am aware that biorgaphies of certain individuals and articles about certain organizations come, from time to time, under scrutiny. For instance, there was the controvery surrounding the biography of John Seigenthaler, Sr.. And it often happens that pages about leading personalities like George Bush need to be locked temporarily to prevent vandalism.

On the other hand, Wikipedia's very touchy about people editing their own biographies. Although Jimmy Wales has himself been caught editing his biograhy, this is generally looked down upon. Tools like the WikiScanner are meant to track down organizations editing their own articles, and big organizations like Microsft and Pepsi have been caught at this.

In a similar vein, the article on Emotional Freedom Techniques, a controversial psychotherapeutic tool, isn't supposed to be edited by proponents, where proponent could mean anybody who might be making money out of it, or might otherwise think it works.

On the one hand, those with the strongest vested interests in an article, aren't supposed to edit it. On the other hand, we have the fact that those who do edit the article do have a vested interest in it. Moreover it also depends on the nature of the vested interests. For instance, the article on Vidyamandir classes has been deleted on account of being non-notable, though the number of people who've heard of it and who're affected by it is probably significantly larger than the number of people who've heard of IMOTC or are affected by that. What's the difference? It's not hard to guess. The former has a profit-oriented vested interest. The organization is a profit-making organization. The latter, on the other hand, is a government-run camp with no profit aims.

In other words, there is a selectivity as to what kind of vested interests aren't allowed to participate, which automatically means a selectivity as to what vested interests are allowed to participate. The same is true for Emotional Freedom Techniques: most of the people who participate in the discussion aren't people who've had extensive experience with it. Why? Because anybody who's had extensive experience with EFT is probably likely to be an advocate of EFT, and any kind of advocacy spells undesirable vested interests.

The reason why this bias and lack of suspicion exists is, once again, because the way Wikipedia is built, they rely on the long tail-end to produce masses of garbage and then filter out whatever is most relevant. Since people who start out with editing Wikipedia usually have very little clear idea or understanding of its policies, the standing assumption is that if an edit looks like advocacy, it probably is. In other words, the very freedom to edit means greater suspicion that certain vested interests will operate, which means suppression of those vested interests, which results in a bias the other way.

My suspicion is that this is a problem. In his TED talk, Jimmy Wales pointed out that the page on George Bush was only protected for about 1% of the time during the controversy. But that misses the point: if an encyclopedia relies on the good of the people, why shoud any protection be necessary? How can Wikipedia strive to be a top-quality encyclopedia providing reliable content, if it is subject to vandalism, "point-of-view" biases, and aggressive, trust-free debates?

What are the possible solutions within the Wikipedia framework? One solution is to provide people and organizations a redressal system. I may not be allowed to edit my Wikipedia entry, but I should still have the ability to post a letter to a relevant authority if I feel the entry misrepresents me (providing adequate references, or pointing out that the entry lacks references). And I should be able to do this effortlessly, through the website, rather than having to call my lawyers or do hard-talking. In other words, a redressal system that is publicized and where grievances are attended to immediately, could be better.

Secondly, it's an interesting question of whether we should have articles on living persons, or active organizations, at all. After all, Wikipedia isn't meant to cover everything. If it were, then articles wouldn't be deleted as non-notable. So it is questionable of whether we should have articles describing all the details of a particular living individual, or whether we should have articles on political figures, or whether such articles should contain anything except biographical information. Or, one could ask, should the Wikipedia articles cover anything recent?

There are related things. For instance, in courts of law, juries aren't supposed to be told of past crimes that a person may have committed, while evaluating whether a person has committed a particular crime. In similar contexts, newspapers aren't supposed to publish criticisms of individuals that are unsubstantiated (there are notions of libel and slander). For all the lip-service that Wikipedia pays to take these seriously, there is still nothing to stop an individual from putting in a "well-thought criticism" of somebody, and letting that criticism affect a number of readers, before somebody corrects it. In other words, the tail-end benefits to Wikipedia are larger than the possible detriment of thoughtful bias injected into an article.

I'll get back with more thoughts on this. If any of you has done research on this, or knows of a way to collect reliable data, I'd be glad to hear your views.