It’s official: we now live in “the Age of Big Data.” So proclaimed the lead article in the Sunday Review section  of the NY Times this past week (12 Feb 2012). Noting “a drift towards data-driven discovery and decision-making,” the Times quotes Gary King, director of Harvard’s Institute for Quantitative Social Science to set the context: “We’re really just getting under way. But the march of quantification, made possible by enormous new sources of data, will sweep through academia, business and government. There is no area that is going to be untouched.”
The good news for academe is that the “Age of Big Data” means rising demand for training and certification. A 2011 McKinsey Global Institute report estimates that the “big data” initiatives of US firms will require 140,000-190,000 knowledge workers with “deep analytical” skills plus some 1.5 million “data literate” managers. (For comparison purposes, in fall 2008 some 196,000 students were enrolled in graduate programs in engineering, physical sciences, and mathematics; US colleges and universities awarded 1.6 million bachelors degrees in A/Y 2008-09. Source: US Dept. of Education, Digest of Education Statistics 2010, Chapter 3.) The demand for big data skills could be a catalyst for the launch of dozens (hundreds?) of new courses, and as well as new certificate and degree programs at colleges and universities across the country. (The McKinsey report, Big Data: The Next Frontier for Innovation, Competition, and Productivity,  is available in PDF, Kindle, and ePub formats from the McKinsey web site.)
The bad (less good?) news for higher education is that (a) colleges and universities sit on huge amounts of untapped archival and transactional data about student learning and campus operations; and (b) academic organizations do not have a great history of using data to aid and inform decision-making. For example, less than two-fifths (35.9 percent) of the 956 college presidents who participated in Inside Higher Ed’s 2011 Presidential Perspectives survey and less than a third of the 1081 provosts who completed the December 2011 Inside Higher Ed survey of chief academic officers report that their institutions are “very effective” in “using data to aid and inform campus decision-making.” The two Inside Higher Ed surveys seem to affirm the assessment in the McKinsey report that education is one of several sectors where the effort to exploit the power and potential of big data will “face strong systemic barriers.”
Yet big data projects, some under the tag of “action analytics,” seem to be proliferating. Several states and multicampus systems are aggregating transcripts data from student information systems and the transactional data from Learning Management Systems as part of the post-Spellings Commission effort to assess student learning, institutional outcomes, and stem student attrition. Concurrently, an emerging cadre of well-trained and certified professionals who have taken an energetic bite from the analytical apple are coming forward to proclaim both their expertise with big data and the power of analytical tools to transform higher education.
Yet truth be told, big data is not new to academe. As a grad student in the late 1970s, I was fortunate to have Alexander W. Astin  as my dissertation advisor and professional mentor. Few would question that Astin is the individual who launched the first “big data” projects in American higher education.
During the 1970s, Astin developed new research models and exploited the power of multivariate analysis as part of his continuing efforts to understand the impact of college on students. Drawing on big data – follow-up surveys of the literally hundreds of thousands of college students who participated in the annual CIRP survey of college freshmen – Astin’s work, particularly his 1977 book Four Critical Years and his 1993 follow-up study, What Matters in College: Four Critical Years Revisited, provides the critical foundation for much of what we know about the impact of the college experience on students and the behavioral and institutional factors that aid or impede retention, degree completion, academic performance, and other critical student and institutional outcomes.
Admittedly, I learned much about data and analytics from Astin as both his advisee and later as I worked with him as the associate director of UCLA’s Higher Education Research Institute. And one of the most critical lessons was that thoughtful, useful, and actionable analysis requires attention to both numbers and nuance – the critical importance of attending to context as part of the analytical task. Absent attention to context, analytics may not be actionable.
For example, the conventional way to run a regression analysis (a method for identifying and isolating statistically significant impacts) for academic performance or degree completion would be to feed all the data (student records that include demographic, transcript, and other metrics) into the computer. What emerges with regression analysis is an algorithm that provides, in rank order, the factors that have a statistically significant impact on the selected outcome, such as freshman persistence, retention within STEM majors, or degree completion. The problem with conventional practice is that the regression algorithm may, at best, account for just 50 percent of the variance that contributes to the selected outcome. Moreover, demographic factors such as gender, ethnicity, and social-economic status often account for a large proportion of the variables in the algorithm. This is analysis that offers little opportunity for action.
Astin understood that researchers and campus officials had to get past the demographic variables that could not be changed to identify key environmental and experiential factors that influence student outcomes – factors that colleges and universities could address by practice and through policy. So rather than running the data for all students, Astin would run separate cuts for different populations: students attending residential vs. commuter institutions, undergraduates at elite vs. less selective colleges, STEM vs. humanities and social science majors, etc. The analysis profiling different student populations and different kinds of campuses provided useful data, information, and insight into the aggregate experience of undergraduates as well as the variations in the experiences of different types of students at different kinds of colleges and universities.
Astin viewed the numbers conscious of context: his long work with and deep understanding of higher education brought insight to the quantitative analysis, providing actionable analytics.
And this is what concerns me as I listen to the growing number of conversations about big data in academe: lots of smart people, both in and around higher education, who advocate for the numbers but who may (or may not) know or understand the critical nuances that provide insight into the numbers.
Let me be clear: this is not an argument against big data. Like you, esteemed reader, I’m painfully aware that much of what is often offered as evidence in campus planning and policy discussions is really based on opinion or epiphany. Consequently I’m advocating for the thoughtful use of big data and the need to attend to nuance as part of the effort to bring big data analytics – actionable analytics – to academe.