Monday, May 9, 2011

Standards of Citation and the Internet

Bland Whitley

Why do we cite sources? I imagine that for most of us, annotating work has become second nature to such a degree that we rarely think about why exactly we’re doing it. I’ll stress two main reasons, though I’m sure others could think of different rationales. The first is a kind of reflexive establishment of scholarly bona fides. As undergrad and grad students, we were taught to base our arguments on the sources and authorities we consulted—you may vaguely recall those dreary high school assignments that required some minimum number of sources. All of this remains of course an essential building block in the development of historical understanding. It is through immersion in a variety of sources that we learn to build arguments out of a variety of competing claims and to establish a sense of the relative reliability of different texts and evidence. The second reason grows out of scholars’ relationship with one another. Whether collaborating or arguing, scholars require access to the evidence that informs particular arguments. Although these rationales are not mutually exclusive (they often reinforce one another), the second should command greater respect. Leading other scholars to one’s evidence, so that they can reach similar or very different conclusions, is what citation should deliver. Too often, though, we can all find ourselves practicing a strategy of citation for citation’s sake.

I’ve been thinking about these issues because of an interesting debate that has played out on a couple of listservs during the previous two weeks (H-SHEAR, geared toward historians of the early republic, and SEDIT-L, which serves scholarly editors). Daniel Feller, senior editor of the Papers of Andrew Jackson, kicked things off with an impassioned critique of lazy citations of material culled from the web. Singling out a few different recent works that have quoted passages from important addresses made by Jackson during his presidency, Feller found that the works were citing either internet sites of dubious scholarly quality, one of which was no longer live, or obscure older works that neither improved on contemporary versions of the text nor took advantage of the contextualizing annotations of modern versions. Why should this be the case, Feller asked. It’s not hard to find print versions of the original sources for Jackson’s addresses. Indeed, it’s never been easier, as all can be found either on Google Books, or through the Library of Congress’s American Memory site. Instead of taking a couple of extra minutes to track down better and more useful source material, the authors had stopped searching after finding the desired text on whatever website seemed halfway professional and then cited the link, no matter that such links frequently have the shelf lives of a clementine.

The response to Feller’s post has ranged from attaboys from traditionalists who view the internet as little more than a dumping ground/series of tubes for scholarly quacks, to condemnation of yet another attempt by an academic to marginalize “amateurs.” (Why is it that all listserv conversations seem to devolve into a spat between angry researchers impatient with professional norms and defenders of some mythical historical establishment?) One commentator referred to articles that have analyzed the high percentage of historical citations of websites that have become defunct, a phenomenon known as link rot. Another pointed out that citing a website that may soon go dead isn’t really all that different from citing an unpublished conference paper or oral history—in neither case is the source material truly available to anyone else. Feller, of course, wasn’t really criticizing publishing or citing material on the web. He was warning that the proliferation of source material on the web has degraded historians’ citation standards.

There are two issues at work here. First, how do we handle link rot? This is a conundrum with no easy solution. Increasingly, all people interested in history, scholars and aficionados alike, will be getting much of their information from the web. What is our responsibility for ensuring that others can check our source material? If we have a reasonable expectation that a given website might not be around for very long, should we even bother citing it? If source material becomes problematic simply because of the ephemeral nature of the venue on which it is found, however reputable, how do we convey its legitimacy as evidence? The second issue relates to the question of what constitutes an authoritative text. The web has dramatically expanded researchers’ capacity to obtain and analyze primary and secondary sources—public records, newspapers, transcripts or digitized scans of correspondence, and obscure county histories, formerly accessible to only the most dogged and sophisticated researchers, are now readily available to anyone. But the web has done all this at random. The Eye of Google™ gazes upon some works but not others. Outdated and overly restrictive copyright laws prevent the sharing of many works. Researchers looking for specific texts to buttress their arguments encounter (through the workings of the search engine) sources that they otherwise would never have considered consulting. Before, researchers would have learned what specific sources one needed to look up when seeking the text of, say, the electrifying second annual message of Millard Fillmore. Now, enter a few key words, and voilà: Maybe you’re more interested in Fillmore’s controversial 3d annual message and prefer it from a printed work? Boom:

Is the above http address a legitimate source for citation? It’s a well-done, university-backed website, and I can only assume (having neither the time nor inclination to verify) that the text is presented accurately. I certainly wouldn’t hesitate to direct students to it. So why not? Well, what if UC-Santa Barbara loses or otherwise decides to pull the site’s funding and it goes dead? Can we depend on other researchers to retrieve it from some archived site (the Internet Archive’s Way Back Machine)? What about the printed source? What of a recent reprint of James D. Richardson (something of the court historian for the nineteenth-century presidency)? Perhaps you’re interested in U.S. relations with Cuba and needed to discuss the Fillmore administration’s rejection of British and French entreaties to forswear annexation of the island. That’s covered in the edition (p. 212), so you could cite it as a source. But beware, Google only offers a summary view of the book. Although you might be accurate in locating Fillmore’s rejection of the British-French tripartite arrangement, you’d be obscuring the incompleteness of the edition you consulted. Rather than helping other researchers, the citation would simply reflect the ease with which specific texts can be found on the web. In cases where the source is not unique (unlike, say, a manuscript letter, diary, or newspaper), citation, when it’s necessary at all, should go beyond merely indicating where one viewed the text. It should point readers to the scholarly apparatus that makes the particular source useful and authoritative.

There’s that word again—authoritative. Now we enter the realm of scholarly editors, who take a special interest in presenting historical and literary texts that are built for the long haul. I’m going to go out on a limb and guess that part of Feller’s justified pique grew out of a realization that not only were the Jacksonian scholars he reviewed citing somewhat dubious sources, they were not consulting The Papers of Andrew Jackson. I experience the same frustration in my work with the Papers of Thomas Jefferson. An all-too standard pet peeve is coming across recent scholarship that cites, not our series, but Paul Leicester Ford’s earlier edition The Works of Thomas Jefferson. Now, there’s nothing wrong with Ford. If one is looking to quote TJ, many of his famous writings are covered in that edition. But Ford’s project was very different from the comprehensive, annotated approach undertaken by modern documentary editions. Not only do modern editions present text more accurately, they present it in context. The primary subjects’ words appear along with the incoming correspondence that might have prompted them. Annotations connect text to other primary sources, as well as to modern scholarship. There is, in short, a wealth of information, both critical and ancillary, that is useful to readers.

So why do so many people continue to rely on Ford? Because his edition has been scanned into Google Books and therefore is convenient for anyone unwilling or unable to search beyond a desktop. Now, I can understand that a lot of researchers out there may not have the institutional support of a major research library and therefore can find it a challenge to get to modern documentary editions. The volumes are expensive, and the work of getting them online (although ongoing) may not occur quickly enough to satisfy everyone—nor does it necessarily lower the price. Still, it seems to me that the facility of the web has encouraged a kind of entitled sensibility among many researchers, who become miffed when something is not available online for free. The kind of scholarship that fills documentary editions costs money, though. Editions may or may not have the ability to publish online with no expectation of remuneration—university presses do, after all, require some return. The internet, however, has untethered the connection between the free consumption of information and its labor-intensive production. Too many researchers, accustomed to getting so much of their information for free from the comfort of the coffee shop, seem increasingly unwilling to do the legwork necessary to gain access to superior sources. Instead they settle for the merely adequate. That’s a shame.

I don’t want to imply that there’s anything wrong with citing material from the web. It’s essential and will increasingly account for much of the information that ends up in our works, particularly as online publication becomes more prominent. We do need to be sensitive to the issue of link rot—the Chicago Manual has some useful hints in this regard, and I am hopeful that archivists and librarians, who are far more advanced in these matters, will come up with some viable solutions. More broadly, the bounty of the internet need not fundamentally alter what we choose to cite as evidence. Standards will and should evolve with the times, but we should not displace one set of works with another simply because the new batch is easily and freely obtainable. Any shift should be based on the responsibility we have to our readers to connect them with the best available sources, print or web-based.


hcr said...

This is a different perspective on this issue that I had considered before. Thank you for it.

My own approach to the internet/hard copy citation has always tried to address the question of how will the majority of readers find it easiest to check my sources, no matter what happens to the internet in 100 years. My solution has been to cite actual hard copy books that are also available on-line, so I figure I'm covered either way. (And, of course, to cite library materials if those same materials haven't been scanned into the internet.)

But that does beg the question you note here: what happens if I want to cite something that is only available on-line? This has just happened to me for the first time, and, ironically, it's a piece from this blog. If some of those articles Randall posted from Robert Darnton's views of internet publishing are correct, it would be a good thing for academic publishing to rest much more heavily on the web. When that happens, we'll have the problem of link-rot in spades.

Thanks for this. It's got me thinking (and worried!).

Chris Beneke said...

Nicely done, Bland. This is a long overdue discussion. My approach has been much like Heather's, but I'm taken by your argument that updated scholarly editions offer a great deal more than their predecessors. And your point that the "internet has untethered the connection between the free consumption of information and its labor-intensive production" is spot on.

Randall said...

An author is more likely to face link rot (I never heard that word, but will now be using it constantly) if he/she is writing on a contemporary topic. I've worried about this some on the co-authored book that's coming out in Sept. We have quite a few on-line source: Christian products, music pages, end-times websites . . .

I do wonder if tools like the Way Back Machine can help this.

Lisa Clark Diller said...

Thank you so much for reminding us about the ephemeral nature of so many online links. I have found this when I've included links to university sites which have primary documents in my syllabi. We can't forget what it is that citation is supposed to do--and if people can't follow the links, that's a real problem.


dan allosso said...

This is a very useful post, and I'm always happy to see someone taking a strong position on an issue. So, let me do likewise:

I sense a whole lot of authority in the arguments on the list-serves, and I think I see some of it echoed here. I feel that underneath this issue, there's a raging sea of concern about standards of the profession, and what might be called the great chain of scholarly authority. And I DO think the web is messing with that, but in a good way.

Authoritative compilations have their place, and I would never argue against looking at the most recent one, when a researcher needs to understand the context of a statement, what may have prompted it, etc., as Bland says. However, much of the time, we already have a pretty good idea about the context of a quote we might want to pull out -- especially from a well-known figure like Jefferson. If I wanted to make a point in a paper or a blog post about Jefferson's agrarian idealism, for example, I might be satisfied with an online compilation that helped me quote a passage from _Notes on the State of Virginia_ correctly. Might not be as important to me, to know what the latest Jefferson researchers had to say about it.

"An entitled sensibility among researchers..." really? And, "The internet has untethered the connection between the free consumption of information and its labor-intensive production." The implication seems to be that my blog post on Jefferson will clearly be inferior to whatever work I base it on, and certainly that my "consumption" will be less "labor-intensive." I'm not sure I'm comfortable with the hierarchy implied in these statements.

There's a range of sources we cite, as Bland says, and the reasons for citing vary with the types of sources. As I understand it, I cite primary sources both to prove that they exist, and to let others know where they can find them -- in hopes, perhaps, that they might find what I've done with them interesting, and may want to take a look at these sources or other like them. The other issue, positioning oneself within the historiography, seems to be the part that's fraught with all the subterranean emotion. And I think that part deserves a lot of examination and debate, at this social and professional juncture.

How deeply embedded in the great chain of scholarship do I want to be? How much of my time do I want to spend with primary sources, and how much do I want to reflect on what others have said? At what point in its life-cycle can an authoritative compilation or an edited selection of conference papers begin to be read as a primary source itself, reflecting the time and place where it originated? Should I answer these questions differently, If I'm writing for an academic audience vs. the public?

Bland Whitley said...

Thanks to all the commenters. To Dan, certainly the listserv conversations that sparked my post exposed the subterranean emotion you alluded to--in spades. And your discomfort with the hierarchy I established between production and consumption is well-taken. There is certainly more of scholarly loop than a chain of authority, something the web very helpfully promotes. And like I said, I don't think immersion in modern scholarship is necessary in all cases. It doesn't seem to me that one would ever need to cite, say, TJ's "chosen people" quote, other than to say it comes from Notes on the state of VA. For non-unique sources, I'd prefer citation to be used more sparingly. It does still seem important to me, though, that we acknowledge that there can be qualitative differences between two sources that contain the same raw evidence. Context, not so much from the prevailing historiographical trends/positioning but from the historical events and conversations that promoted certain series of words, matters. Without some sense of that context, we end up analyzing our subjects through the prism of vague generalizations, rather than as the flesh and blood figures they were.

dan allosso said...

Thanks, Bland. I agree completely about context. Right at the moment, I'm thinking about that in terms of setting, as I try to work out how much to make the places where my history unfolds into historical actors (almost, characters?) in their own right. But even in the case of familiar pieces like Jefferson's Notes, I suspect you're right that some context is helpful. Don't people generally ignore the fact that it originated as part of a political dialog with France?

Yvonne Perkins said...

At the end you raise the important issue of money. I love free, open access but we need to recognise that for good material, the author or producer of that content has spent hours and hours of their time. This time quite often reflects significant investment in training and years of experience. It is great having free access to their work on the web but creators of content cannot live on our praise alone. There are powerful reasons why knowledge should be freely accessible for all, but there are also indisputable reasons why we should pay people when we benefit from their knowledge and research. But how do we reconcile these two principles?

Bland Whitley said...

The question of money is critical here, and one that could merit a second post. For now, I'll say that historical production shares some of the same burdens that afflict daily newspapers or the music industry. The internet has destabilized the ways that work has previously been monetized (however modestly). There's no way to put that genie back in the bottle, and I wouldn't argue that we should try, anyway. We all benefit too much. I can only hope that over time we can ensure some methods of remuneration for the value that researchers add to this wealth of information. Otherwise, we all end up like songwriters with unappealing voices--providing content with no hope of royalties.

LD said...

This is a great post, and this is a key sentence: "The internet, however, has untethered the connection between the free consumption of information and its labor-intensive production."

Lord, don't get me started on "free" information from which everybody profits but the people who produce it.

It seems to me that we are at the ragged liminal edge of a major cultural and epistemic shift. Anxieties about authority and authorship and their relation to knowledge are well-founded, in the sense that they point to the immediate site of transformation. But it's hard to say what that transformation is or might mean --

I guess a good place to begin would be to think about what all those things have meant. That lets us know what's at stake in the change. I think it's something big -- like, epistemology with a capital E.

Anyway, great post.

dan allosso said...

Let's not forget that the old scheme of music publishing wasn't really that good for artists -- it was great for music publishers. But no one suggested limiting access to guitars and drums, so that regular people wouldn't be able to give away their music for free and ruin the industry's cozy arrangement.

While I agree that there are values that are not captured by market calculations, I think we also need to accept that there IS a market for ideas and scholarly work. The way the system works right now, work with a broader popular appeal can sell for less, because it sells more units. Highly specialized work with extremely limited readership is available for specialists in university libraries, and generally these journals and monographs sell at prices that take them out of the game for scholars without academic access.

I assume we're not saying that if a book sells a million copies, it necessarily has less scholarly value than one that sells four hundred. But I don't understand the analogy to songwriters with unappealing voices. Why not learn to sing?

Bland Whitley said...

Well, not everyone has a money-making voice, right? Someone like Harlan Howard could make a good living because other people performed his songs, sold records, and gave him a cut of the royalties. But if people download those songs for free, there's no cut for him. That doesn't make the old system perfect--just means the new regime has lost a place (for now) for that kind of producer.

M.J.G. said...

Great article.
As a young researcher who actively makes use of online sources, I recognize the concern that many scholars will rely on merely adequate sources rather than bona fide ones. However, I also believe that the scholars and researchers who seek to make a name for themselves and their work, will continue to do more "traditional" sleuthing in archives, libraries and historical societies. Just as we know not to rely on one contemporary view of history to fit all purposes, I'm hoping we'll all learn not to only rely on "free" sources.

Peter said...

I think that the issue is being made more complicated than it need be. Link rot is an "old" issue for digital librarians, who are concerned about the permanence, accessibility, and integrity of their archives.

Scholars can deal with this by preserving a copy of pages they cite. When citing a document that is an image of a print work, one can simply cite the printed work and its page number. When citing a web page that contains text, one should whack the page to disk and save it, producing it for anyone who asks to see it -- including publishers and copyeditors, who should always ask for proof anyway. Browsers now come with plugins that save pages; one can also print the image to PDF. Browsers will also configure the printout to include the date of access, URL, etc. Disk space is cheap -- a lot cheaper than rows of filing cabinets filled with yellowing paper -- so this is an important way to preserve and display the sources one cites.

In the end, the argument about having trouble citing print versus online is really a case for putting everything online, where it is accessible, can be cited, and can be saved and checked with minimal effort. One can then link directly to sources and if the link is dead the author can be queried to produce a copy of the page she cited. If she can't produce the copy and the link is dead, then the reference should be considered bogus.

Daniel Feller said...

I want to thank Bland Whitley for his thoughtful essay, which I was not aware of until it was cross-referenced on H-SHEAR a week after it was posted. Unlike some respondents, Bland understood my point exactly. I was not complaining about the use of the internet (or of books old or new) per se. I was decrying reputable scholars' sloppy reliance on secondhand and often defective information when the primary sources are so readily available. Thank you, Bland, for reading me carefully and getting me right.

I did wince at one point in Bland's essay, when he called James D. Richardson "something of the court historian for the nineteenth-century presidency," because this phrase might lead readers to misconstrue the issue at stake. Richardson was acting as a scribe, not an author. His Compilation of the Messages and Papers of the Presidents is just that: not a time-honored history, but a straight compilation of primary sources. Richardson copied his texts right out of the original printed versions in the House and Senate journals. One might do this badly, but in fact he did not. Historians (including us) have been over his texts thoroughly, and they are faithful reproductions of the originals, even to such incidentals as punctuation and capitalization. For presidential messages, Richardson is as reliable and safe to use as any modern documentary edition. Still, as I pointed out, if you want to be extra-cautious and not trust Richardson, the originals themselves can be found at the touch of a button. Given this, relying on later and less known reprintings, or on truncated and inaccurate website copyings, is simply inexcusable.

Bland's guess that I'm also piqued at scholars' non-use of The Papers of Andrew Jackson is well reasoned but wrong in this particular case, for the very reasons explained above. In fact, as a matter of policy we do NOT reprint in our series the official delivered texts of Jackson's presidential messages, precisely because the original (and therefore by definition authoritative) printings already exist in formats that are as widely available and easily manipulable as our volumes can ever hope to be. Why burn 20 precious pages on reprinting an annual message when the actual original printed pages of that message are not only in libraries all over the country but are up there for free -- and searchable! -- on the Library of Congress website? By not pointlessly duplicating readily accessible official message texts, we free up precious space to publish things that are new and valuable -- among them handwritten drafts of those selfsame messages, which have never seen print before and which are often revelatory regarding how and by whom Jackson's policy statements came to be framed.

- Daniel Feller
Editor/Director, The Papers of Andrew Jackson