January 28, 2013

What Aaron Had In Mind

It bothers me that so much of the press (and Wikipedia) coverage of Aaron Swartz assumes that he planned to redistribute the JSTOR articles he downloaded. That’s possible, but it’s probably wrong.

Aaron studied large collections of texts. His landmark Wikipedia research asked who really writes Wikipedia by examining the provenance of thousands of articles. He showed, contrary to what nearly everyone expected, that most of the information came from a wide range of specialists who edit Wikipedia infrequently, while a small core of very visible editors each contribute thousands and thousands of small, technical edits. He published that work himself, but it would have waltzed into ACM Hypertext or Web Science. Later, he did a similar study of legal citations for the Stanford Law Review.

This was one of Aaron’s core strengths. Swartz had a good intuitive sense of what studies would be practical on modern computers. I’ve done some work on text collections myself, but while I know it’s possible to study a million documents, all my reflexes are tuned to smaller studies. So, “Link Apprentice” used the Federalist Papers and Shakespeare, and my studies of “Patterns of Hypertext” and “Criticism” have about a hundred references. I don’t reach for the millions first; Aaron could.

There is nothing wrong with this kind of research. It’s possible that it technically violates JSTOR’s terms of service, but it’s always been part of what we mean by “using the library.” If I want to study bookbinding by looking at a million books in Widener and noting whether they’re red or blue or green, my GSAS doctorate says I have every right to do so. If I’m making too much noise or get in Professor Higginbotham’s way, then we can discuss ways to make less noise or visit the fourth floor when Higginbotham is in a better mood.

This doesn’t mean that Swartz did nothing wrong. Students do all sorts of things that are wrong. In college, we used to break into the swimming pool to go skinny dipping on Spring evenings; this would have gotten Jean Valjean in a pickle and broke any number of rules and laws and it wasn’t really safe (though we had lifeguards), but it was fun. I knew people who visited other forbidden places: Clothier Tower (off which a sorority reject threw herself in the apocryphal past), the dome of Parrish Hall, the underground network of steam tunnels.

It’s not impossible that Swartz planned to share the JSTOR downloads somehow. But I know of no evidence for that, and there’s plenty of reason to think we was working on a new large-scale text study.