A long walk through the English countryside and the current flap over the government surveillance of cell phone records touched off my deeply held and unreasoned Luddite reaction to "big data." Like most over-hyped trends, the surge of interest in big data and its application provokes ennui among those of us with some mileage on our sneakers. Gary King of Harvard says that with all the available "big data" students in their freshman year can be given a personalized plan to achieve their lifetime career goals. Harvard Business Review claims that data science is the sexiest new profession. Every day brings us the media hyperbole of the application of big data to commercial, political, and scientific enterprises. While some skeptics have surfaced, the mainstream press continues its love affair with big data.
The long walk I recently took through the English countryside (200 miles in two weeks) reminded me of the value of limited information and gave me unencumbered space to think about my oddly blinkered view of big data. Collecting and analyzing data is after all, how I have made a living for 30 years. Data remain to me the only icon of science left largely unsullied by politics, ego, and money. Perhaps I am just jealous, as HBR suggested the old guard of statisticians, survey methodologists, and data analysts are not equipped to join the brave new world of big data.
What convinced me otherwise was the way my husband and I recently managed to mostly not get lost on the famous yet poorly marked coast-to-coast walk through the English Lake District and Yorkshire Moors. We used a $1.50 plastic compass, survey ordinance maps, a highly schematic guidebook and each other. No GPS, no Google Maps, no iPad or iPhone, no turn-by-turn directions. The simple tools of "compass, map, and thou" are based on substantial abstractions of geographic reality subject to errors of judgment and interpretation. More detailed information would have overwhelmed us as we walked while trying to avoid deep bogs, animal excrement, and slippery precipices in the fog and rain. Decisions made with paper maps, trust, and a little visual triangulation kept us true to our course 90 percent of the time.
And so to big data… The history of science is actually one of reverse engineering. In the beginning, our measurement tools for the physical and social world were so crude that the combination of substantial abstraction and painstaking taxonomic description were the only choices. The grand theories of natural selection and relativity emerged at a time when the data were very sparse and poorly collected. To have any reasoned explanation of the world, scientists of earlier eras had to accept that the empirical world they could observe was quite limited and distorted. Improvements in our tools have allowed us over time to anchor and refine those grand abstractions with a reality closer to what is observed. Still, the world comes to us through a glass, darkly. Until very recently, we have continued to use substantial abstraction to see and understand natural and social phenomena.
The problem with big data is that it is like trying to take a sip of water from a fire hose. "Big" data is really a euphemism for all of the data thrown off by the digital engines that drive our economic and social transactions. Electronic medical records, arrest and conviction records, loyalty card data from the grocery store, all of the stuff you tell OkCupid and Match.com, Google search histories, insurance claims, cell phone calls and even the digital things we create like tweets and blog posts.
Any transaction, business process, or social engagement that uses a machine that records, counts and stores stuff in a digital format generates data. Now people and institutions leave digital footprints everywhere. We used to have to ask questions or collect paper records. Now, it is like slapping a universal bar code on the back of every person and business in the world. Every time they do something, the big barcode scanner in the sky records it and stores it. Data are no longer representing reality but rather are the reality.
The problem of course is that we have almost come full circle. Rather than too little data, poorly measured, we now have too much data, precisely measured. Our ability to use data effectively to make decisions or understand the world depends on our ability to see patterns and abstract from those patterns. Big data is, in many ways, an exact replica of reality. Using big data to make decisions is like using every square inch of soil, landscape, and sky in my 200-mile walk across England to figure out how to get around the corner in the next small village. It feels to me as if we need to return to the time of Linnaeus, the famous Swedish botanist whose pioneering classification of the natural world gave us the concept of the "species," to classify the intersecting and complexly nuanced world thrown off by our digital engines before we start making decisions using this unknown commodity. We need to rebuild those high level abstractions from the ground up to make sense of this new reality.
My difficulty with at least the political and commercial applications of big data is that our tools of abstraction and decision-making are decidedly underdeveloped when faced with this type of data. As long as Netflix doesn’t understand that when I share my account with my early 20-something daughters, their big data application will continue to recommend "Buffy the Vampire Slayer" and "Gossip Girl" to me when my real preferences run to "Masterpiece Theater" and subtitled films. On a more serious note, our real fear of the use of cell phone transaction data to understand the social networks of individuals is not necessarily about the invasion of privacy but the possibility that the wrong person will be identified as a threat because his or her data are taken out of context. It is no longer whether our data are adequate to support our theories but rather whether we have developed adequate theories to explain our highly nuanced data.
Or maybe I am just jealous that Google hasn’t come looking for me…. yet.
Felicia B. LeClere is a senior fellow with NORC at the University of Chicago, where she works as research coordinator on multiple projects. She has 20 years of experience in survey design and practice, with particular interest in data dissemination and the support of scientific research through the development of scientific infrastructure.
Southern University in Baton Rouge eliminated the job of Dong Sheng Guo, a physics professor, in early 2012, as part of a round of budget cuts, but he went on teaching the fall of that year, and the following semester as well, The Baton Rouge Advocate reported. Guo says that he was never formally notified of his dismissal and only became aware that his job had been eliminated when he went to the human resources office to ask why he was not being paid. It is unclear how he was assigned class sections when the university believed his position had been eliminated. Guo is now appealing for his job back.
A professor of English at the Virginia Military Institute is on paid leave indefinitely, following his refusal to work or quit, the Roanoke Times reported.
Kurt Ayau was one of seven professors who took issue last year with department leaders and affairs, including a new curriculum. Six have resigned or retired, but Ayau said the institute offered him a leave of absence for what he understood to be one year, and he took it to support himself as he looks for another job, according to the Times.
An institute spokesman said Ayau was on paid leave, but that the timeline was undetermined. Ayau’s salary is $59,642. The spokesman declined to comment on why Ayau was offered a leave of absence, citing personnel reasons. The institute’s Faculty Handbook says that extended leaves may be granted when “in the best interests of the faculty member and the Institute.”
Ayau did not return an e-mailed request for comment.
Judges are speaking out against two law professors -- once a couple -- whose divorce and post-divorce litigation has taken up court time for the last 17 years, USA Today reported. The parties are Christo Lassiter, a law professor at the University of Cincinnati, and his former wife, Sharlene Boltz, a law professor at Northern Kentucky University. Judges have criticized both for their approach to the divorce, for allegedly breaking court rules and for using up court time. In a hearing last month, one Ohio judge said, "I am really shocked, because when I was in law school my professors were outstanding. They never would have told me that behaving the way you all have, both of you, over the past 20 years, is acceptable behavior."
The Central Intelligence Agency has for years denied that it had a file on Noam Chomsky, the Massachusetts Institute of Technology professor known both for his contributions to the field of linguistics and (perhaps of more interest to the CIA) his criticism of the U.S. government across many administrations. Now, in response to a Freedom of Information Act request, documents have confirmed that the CIA did have a file on Chomsky, and that it may have been scrubbed. The details are in Foreign Policy.