Learning Insights From Big Data

A field report.

March 10, 2014

Upon starting my position of Research Fellow at HarvardX, a University wide effort to use technology to transform teaching and learning on-camps and online, I soon realized that I had an unprecedented opportunity—in particular, as a researcher.

I had just finished lecturing for Stat 221, a core PhD-level course on computing and visualization in the Harvard Statistics Department. The class brought in 15 academic and business-sponsored partnerships on data-driven challenges.

It wasn’t enough to simply have a good solution or a solid theory to a problem, but to have one that made sense in the marketplace. In the end, teams created great products such as a tablet app to predict traffic accident risk on the road in real time and an algorithm to score relevancy of 9 terabytes of Wikipedia documents.

Encouraged by my mentor Joe Blitzstein, a recognized innovator in online and on-campus pedagogy, I decided to bring to HarvardX a similar philosophy: blending rigorous scientific methods with smart “packaging,” or, basically, integrating data and findings into intuitive products utilizing modern technologies.

As researchers, we are comfortable with all kinds of numbers, plots, and theoretical concepts … but having to think about making findings as easy to use as an iPhone app is not necessarily something you get taught in graduate school.

But with MOOCs, where the very idea of them is to expand access to all and to advance the science of learning, it seemed utterly necessary to make sure that my work went beyond rarefied academic journals.

With that as my mandate, I quickly embarked on several projects, including a Gates Foundation sponsored study to leverage clickstream data to personalize massively open online courses (MOOCs) for various learning goals of users. This work contributed to the inaugural set of Harvard-MIT working papers on the first 17 MOOCs offered via edX; the report is one of the most comprehensive studies of MOOC participant behavior to date.

At the same time, I was developing ways to visualize and manipulate the massive amount of data—as without seeing a map of gender breakdowns or where completion rates were the highest and lowest, for example—the big data from MOOCs seemed too abstract and, most important, too prone to mis- or over-interpretation.  

All together, this work resulted in what I called, “Insights,” a close to real-time open-source interactive data analytics platform for online offerings updating at regular intervals.

The goal of the system from the outset was to bring intuitive research-grade data on all HarvardX online offerings to the fingertips of all stakeholders: researchers, course developers, leadership, reporters, and the general public. I also hoped that other institutions would start using the system to build a rich multi-institutional ensemble of relevant information.

After I implemented and demonstrated the initial prototype internally in August 2013, the project quickly gained support and momentum. Research team members contributed ideas, and HarvardX hired more help to scale the effort.

In fact, without collaboration, Insights would not have been possible. I relied upon user-designer experts (Konstantin Kashin and Qiuyi Han); fellow researchers and brave prototypers (like Daniel Seaton at MIT who adapted the system for MIT’s MOOCs); and of course, the big guns, like Jim Waldo, the CTO of Harvard, and open online research gurus Andrew Ho and Ike Chuang, of Harvard and MIT, respectively, who provided guidance on data storage infrastructure.

As a story in our local Harvard University newspaper proclaimed, just like building online courses, research about online courses also “takes a village.”

And yet, the real test was in the “marketplace”—or at least what other colleagues thought about the visualization effort.

The official release of the Insights system occurred jointly for MITx and HarvardX on February 20, 2014. The response from Insights users was overwhelmingly positive.

Course teams loved being able to track enrollment and user demographics through a course’s lifetime.

Researchers were excited to see their current research reflected visually, and generate new hypotheses and directions for future work.

Leadership enjoyed an eagle-eye view on all online offerings, which enabled better decision-making (and some excellent PR moments, as when during a meeting they could roll over a country and show how many students were taking HarvardX courses).

And the media were surprisingly kind (for the most part) as well, especially given their rightful concerns about all the MOOC hype. They picked up on a theme I’d hoped they would: the commitment to transparency. They had no need to rely upon anyone’s spin to see a treasure trove of data.

Even better, once the tools were out there, others wanted to find ways to make them better and more useful. Our own data infrastructure team was able to correct idiosyncrasies in the data due to near real-time feedback loop provided by Insights. Several universities around the world, as well as edX, sent inquiries about adapting the system for their needs.

While working on Insights, I discovered that even the simplest data can be a powerful catalyst to innovation, as long as it is accurate, intuitive, and always up-to-date. Most important, the findings have to be accessible to the stakeholders (especially if they are not researchers).

I was surprised with how many conversations Insights helped spark, how many ideas were seeded after tracking data changes from one week to the next, and how the general sense of “operating in the dark” had begun to subside. MOOCs were finally, literally on the map in a meaningful way.

To me, this meant that Insights users felt like they were informed and in control of their data. In my mind, that’s a true win and a tenet that I would urge more researchers, in all data-intensive fields, to strive toward.

Going forward, I plan to continue creating intuitive and modern data-driven products in order to let the data speak and generate value. In addition to qualitative assessments, I envision best practices in teaching and learning, both online and on campus, becoming increasingly underpin with real-time, intuitive data.

I want to stress that such an approach is not new, as companies have long used big data to help consumers find information, figure out what kind of appliances to buy, or to produce the now ubiquitous “we think you might like” lists.

Universities do not, however, have to become more and more corporate (a common fear in the MOOC age) to benefit from thinking about ways to make research more friendly, and usable, to a variety of audiences.

Working on Insights, I learned that opening up research data and presenting it to entirely new audiences in intuitive ways might offer surprises … and discoveries that can readily be re-integrated back into the lab, into the classroom, and in my case, into my research career. 

Sergiy O. Nesterko is a HarvardX Research Fellow working on adaptive media and gamification and a founder and principal at Theory, a place where research in statistics, machine learning, and interactive visualization connects to data-driven business, non-profit, and government problems.

Be the first to know.
Get our free daily newsletter.


Back to Top