News, Views and Careers for All of Higher Education
April 22, 2005 Inside Tech Ethics
I am among the few professors who can identify a corrupt Shakespearean manuscript — an inferior facsimile of Hamlet, say, that an Elizabethan actor recited to a printer in return for a beaker of ale. I would compare that manuscript to another version closer to the original, detecting phrases and locutions that better embody the Bard’s verbal genius.
Shakespeare never published his plays, of course. But some actors were better at remembering lines than others. Thus, several variants of a given work might exist. A good textual editor can discern which versions are “fairer,” or more authentic, than others more “foul” or corrupt.
I have been thinking about Shakespeare, born April 23, 1564, and died on that same date, at age 52. I’m age 52. By what measure will I be remembered by the digital literati with a research specialty like mine, seemingly worthless at the dawn of the Internet age?
Perhaps not totally. A few years ago at a university where I used to work a colleague was sending anonymous libelous memos, which I analyzed the way I used to review passages from plays. You see, over time, each of us develops a distinct textual signature. We may be given to odd phrases, locutions and colloquialisms, such as “in regards to” or “clearly, it seems” or “in cahoots with,” as in, “In regards to his annual review, clearly, it seems, John Doe is in cahoots with the Dean.” Collect enough writing samples, and you can identify the likely source of such a sentence, just as you can discern a fair from foul excerpt of a Shakespearean play.
Granted, my application of textual analysis was trite compared with that of Shakespeare Professor Don Foster at Vassar College, known for outing journalist Joe Klein as the anonymous author of the 1996 book Primary Colors.
In the case of the defamatory memo-writer, I ran suspect phrases through thousands of Eudora e-mails and identified the culprit. I did not turn him in, fearing the prospect of having to explain textual studies in a toxic workplace. Soon after the person left an original of his latest missive in the photocopier with his handwriting on back side of the recycled paper. I put a notice in the mail room that I knew who the person was and demanded that he cease and desist — which, to my surprise, he did.
That one episode aside, my research acumen was meaningless in an era of cell phones, personal digital assistants, laptops, home computers, Internet, iPods, blogs, and other technological wonders. So I forsook iambic pentameter and focused my scholarship on Pentiums.
Last year I was fact-checking the final manuscript of my new book Interpersonal Divide: The Search for Community in a Technological Age (Oxford University Press, 2005), when I found that 30 percent of my Web-based footnotes no longer functioned on the Internet. Footnotes malfunction for many reasons — technicians reformat folders and redesign sites or, especially worrisome, revise content at the same online address.
That was the case in the introduction of my book with a reference to Microsoft’s mission statement, which I had retrieved in late 2001 from the Web. At the time it stated: “Microsoft and its employees recognize that we have the responsibility, and opportunity, to contribute to the communities in which we live, in ways that make a meaningful difference to people’s lives.”
Two years later, Microsoft’s values apparently had changed. Now it vowed to show “leadership in supporting the communities in which we work and live.”
The change in wording may be subtle, from Microsoft’s perspective, but it corrupted my citation. Any scholar checking my references would wonder whether I fabricated the footnote.
Alas, there was little I could do as an author writing about the Internet. Simply, I had to cite Web pages. I also had to rewrite my book manuscript for Oxford, but this time I printed out copies of Web sites used in footnotes and sent two thick binders to my editor to validate sources.
Then, working with a colleague, Assistant Professor Daniela Dimitrova, we documented the citation problem in a quantitative study that has been selected as a top paper in the technology division of the International Communication Association convention. We present our findings and recommendations on May 29 in New York City.
Our study is important news, especially for librarians, who have spent billions on computers and subscriptions to online journals whose footnotes often are as ethereal as Ariel in Shakespeare’s The Tempest, believed to be his last complete play.
The most famous line from that comedy — “We are such stuff/ As dreams are made on” — was recited by the magician Prospero after he makes a bevy of spirits vanish, to remind us that life is brief.
Our time here may be fleeting — “Out, out brief candle!” — but footnotes are not supposed to be. When online citations extinguish, every discipline is befouled, because replication, at the heart of the research process, becomes difficult without stable archiving, which libraries used to provide.
It was called a book shelf, as in Shakespeare’s day.
The Bard never published his works, perhaps because he did not trust the new medium of the printing press. It cut into his profits as a playwright, much the way Internet cuts into profits of Puff Daddy when digital pirates purloin his rap.
But there are also vast differences between the printing press and the Internet, especially when it comes to books. A book is the ultimate fire-walled medium, with exact printed copies distributed to libraries. Manipulate books in a library, and you are committing a crime — literally. Least we forget, there is a reason for that, which has to do in part with scholars needing the work for a reference. That is not the case with Internet, which allows a user to select, copy and paste original works into a folder called “My Documents,” as if that user had authored them.
Now imagine if Google digitizes entire libraries. True, computer patrons will be able to copy and paste text from books—and then re-write and re-distribute variants via the Internet — but in time, who will be able to distinguish fair copies from foul derivatives?
I would, thanks to my doctorate in English.
Want it on paper? Print this page.
Know someone who’d be interested? Forward this story.
Want to stay informed? Sign up for free daily news e-mail.
Advertisement
Since most of us don’t have doctorates in English, the Google Factor is indeed a concern and one that needs to be addressed. Thank you, Dr. Bugeja, for sounding the alarm.
Susan Porter, Scripps Howard Foundation, at 12:05 pm EDT on April 22, 2005
As I’m sure you’re aware, http://archive.org is one attempt to archive a fraction of the web.
Mark Crane, One archiving attempt at Utah Valley State College, at 6:26 pm EDT on April 25, 2005
How does/can efforts like the Wayback machine to preserve earlier versions and locations for webpages fit into this? Does it?
Mr. Peabody, at 3:58 pm EDT on April 26, 2005
Thanks for your thoughtful comments. We have looked into both archiving methods, essentially snapshots of the Web; they cannot counteract the several ways footnotes deteriotate. Sometimes technicians whose truth is server space reformat footnotes. Sometimes companies whose truth is profit archive them for a fee. Sometimes datafeeds into libraries insert spaces where none should exist in a footnote, generating the “20%” computer symbols. Sometimes new content is added at the same URL. Sometimes Web pages come down. Sometimes domain names are transferred. And so forth and so forth. I can list several more reasons why footnotes lapse, but the moral is, Internet archiving as we now know it cannot preserve footnotes as if they were printed on paper in the firewalled medium of the book, upon which our scientific method relies. Thus, the thrust of our work, which may be a life’s work (or half-life’s work, as I will turn the project over to Dr. Dimitrova at some point)is a new methodology for the Web that takes into account its dynamic features. We don’t know if this is a possibility; at present, every thing from convenience study to intercoder reliability is at stake. (Personally, I doubt that we can devise a methodology that works unless it is backed up by paper, which is why I print out my URLs.) My own thesis is that the new dominant medium of Internet is too commercially based and dynamic for serious scholarship, and libraries have bought into technology in such a big way, that many librarians view their jobs now as dissemination managers for data rather than gatekeepers for fact. In an ideal world with proper funding, libraries would continue to buy print and subscribe to databases and digital journals. However, as most scholars want quick access to data—which the Internet readily supplies (and then corrodes)—I think one day we’re headed for the world that I describe in my whimsical essay in Inside Higher Ed.
Michael Bugeja, Director at Iowa State University of Science and Technology, at 7:43 am EDT on April 28, 2005
Fair is foul, and foul is fair:Hover through the fog and filthy air.
Paul M. Wright, Editor at University of Massachusetts Press, at 12:25 pm EDT on April 28, 2005
The Internet certainly increases the likelihood of errors being replicated. And, as we all know, while we may discount something read in one place, it is more difficult to do so if one sees the same information in, say, five or six places. Indeed, at that point, people generally accept it as fact. Anyone who has spent much time searching online genealogies can quickly attest to the fact that errors posted in one genealogy are quickly picked up and reposted in many others due to the prevalence of the GEDCOM format for importing another genealogy into one’s own. Find one common ancestor and you can instantly import all of the other common ancestors. Now, if one takes the time to painstakingly vet and verify the information in the imported file, then the process is wonderful. For amateur genealogists, however, importing multiple files and the reposting the final product appears to be disturbingly common.
At the same time, pockets of order and discipline already exist on the Internet. In the world of digital libraries, I am especially confident that accuracy will ultimately prevail. Unlike genealogies, where people are citing resources such as family bibles and other sources not generally available, many copies exist of most of the content in today’s libraries (16th century Shakespeare texts excepted).
As the world’s leading Internet-based, online academic library, Questia has scanned and digitized the full-text of over 56,000 previously published books and nearly one million articles. While we have undoubtedly introduced some errors along the way, I believe they will be corrected over time. What is more worrisome is that many students will use whatever they can find on the Internet for free rather than the academically vetted, peer-reviewed content available through Questia.
For centuries, students doing research have used materials that have gone through two filters before they use it: 1) the editorial process or peer review and 2) the collection development process of their college/public library. Before something is published, it must get past an editor. Then before it makes its way into the brick-and-mortar library, a collection development librarian has to decide it’s worthwhile to purchase and put on the shelf.
While I am a fan of blogs, they and much of the rest of the content on the Internet have not gone through either the editorial or collection development filters or any other filter for that matter. As a result, the content is not reliable or credible for research. In contrast, Questia replicates these filters on the Internet by providing online researchers with the same high-quality content that is in the traditional library: we have agreements with over 250 leading academic publishers to include their books journals, but we only license what our team of collection development librarians select. At its core, we provide a collection not simply a listing of individual books.
Although errors may be introduced in the process (we steadfastly attempt to avoid them and go so far as to maintain the original pagination and even line breaks of the books), the greatest error is having students use and cite unvetted content that lacks credibility as if it were the same as the content that is in the traditional library. Much like how the advent of the word processor vanquished the typewriter, the dawn of Internet search has surmounted the physical trip to the library. As educators, policy makers and thinking citizens, however, we must recognize that our society has a vital interest in bringing the high quality content that is available in our libraries online and making sure that students and researchers can tell the difference between something that has gone through the editorial process and been published from something that has not. At Questia, we are hard at work at just such a goal.
Troy Williams, President & CEO at Questia Media, Inc., at 5:00 pm EDT on April 29, 2005
Thanks for your response concerning Internet and introducing us to Questia Media.
I’m author of 20 books and hundreds of journal and newspaper articles in some of the world’s best presses and journals. In a Questia Search, I only found three minor citations of my work under “Michael J. Bugeja” and five other citations under “Michael Bugeja.”
A student looking to cite my works on Questia would go to Google.com where I am mostly blogged.
This is not to criticize Questia, which states that it provides “24/7 access to the world’s largest online collection of books and journal articles in the humanities and social sciences, plus magazine and newspaper articles.” But I hope your collection grows even larger so that it might contain a snippet of my work over 30 years.
Michael Bugeja, Director at Iowa State University of Science and Technology, at 8:12 am EDT on April 30, 2005
Advertisement
or search for jobs directly.
Michigan Technological University announces a Strategic Faculty Hiring Initiative (SFHI) that will add ten tenure-track ... see job
The Department of Pediatrics in the School of Medicine, University of California, Irvine, is anticipating openings for ... see job
Posting Description: Assistive Technology Partners, a program within the University of Colorado Denver, ... see job
Assume substantial independent senior administrative authority in the Dean’s Office, School of Medicine; for leadership and ... see job
Located on the west side of Los Angeles overlooking the Pacific, Loyola Marymount University seeks professionally outstanding ... see job
The nation’s first university, Penn is a world-renowned leader in education, research, and innovation. Situated on a ... see job
The Howard University Department of Radio, Television and Film in Washington, DC seeks two Assistant Professor/tenure track ... see job
Eastern Illinois University has a 113 year legacy as an intellectual focal point in central Illinois. Its acclaimed programs ... see job
Drake University is a diverse learning community of more than 5,600 students, distinguished by collaborative learning among ... see job
Posting Description: University of Colorado at Boulder Libraries ASSOCIATE DIRECTOR FOR ADMINISTRATIVE ... see job
Interesting debate
You and Questia should not be arguing. As a teacher at a private high school who uses Questia, and a Shakespeare fanatic, I hate to see two groups who should be in support of each other exchanging barbs. To criticize Questia for not having everything is like criticizing any library for not having everything: you can, but unless you are at the Library of Congress or the Bodleian you really do not have a point. Questia adds things every day; I would compare it at this point to a mid-sized college library. For my students, who would otherwise be using a 20,000 volume collection, it is a godsend. The trouble I have with other teachers is that they think Questia is the Internet, which it is not. It is a LIBRARY; with books and journal articles. The difference is that you can search for a word in EVERY BOOK at the same time. If you search for Shakespeare, you get 31,052 sources that mention the name. If you search withing those results for, say, “blood” you get 18,997 times “blood” is used in books that mention Shakespeare. I am being simple, because finding what you want is different from paper searches, but the tool is phenomenal. We just had Nigel Wood, who has written a number of books including being general editor of the Theory in Practice series, here to lecture on Twelfth Night, which we were doing. I showed him Questia, and he was pleased with how OFTEN references to his work appeared. He could search his name and read a number of references to his work by other authors that he had never seen before.You guys are on the same side. Questia has to pay for rights like anyone else, and automatically credits authors. Everyone needs to lose preconceptions and figure out how to value intellectual property in the ever-more digital future.
Tom Keelan, Miami Country Day School, at 12:30 pm EDT on April 6, 2006