Pitfalls to avoid when interpreting research studies on higher ed (opinion)

You have /5 articles left.
Sign up for a free account or log in.

This is an exciting time for researchers who study the higher education market. Academics, think tanks, advocacy organizations and colleges themselves are producing and publicizing an unprecedented amount of information that could help us evaluate how well our current system is functioning and how we could improve outcomes in the future.

Some of this work uses excellent methods and data, and it should inform our national debate. Yet much of the work is of low quality. This variability complicates life for policy makers, for the people who work in and guide the nation's higher education institutions, and for families and students who want to make good decisions.

The rising tide of misleading "research" also poses a challenge for journalists who communicate this work to the public. News-media outlets wield an enormous amount of power over public opinion simply through the choice of which studies to cover and how to frame them.

Let's look at a few concrete examples, and then we will offer some tips on how to read the study du jour.

Complete College America has found that students who take 15 or more credit hours per semester are more likely to graduate, and they will do so in a shorter amount of time relative to students who take fewer credit hours per semester. This "evidence" suggests that colleges and policy makers should find ways to push students to take more credit hours each semester. States could offer a tuition discount on higher course loads (a carrot) or require more credit hours to receive financial aid (a stick).

Studies of this sort use correlations in the data to draw strong policy conclusions. But students who choose to take 12 credit hours or fewer per term aren't drawn randomly from the same population as students who take 15 or more. This is called "selection bias," and it makes any estimate of the relationship between credits taken and the likelihood of graduating unreliable. Taking fewer credits per semester elongates your college stay by definition. This study presumes that the reasons people have for choosing the longer route don't matter.

There are myriad reasons why students choose to take different course loads. Sometimes these reasons can be considered random or are explained by crude factors that researchers can control for, like Pell Grant status. But more often, these choices are made based on private information that researchers cannot easily see, such as detailed knowledge of your own financial, health and/or family situations. Restricting the data to, say, Pell recipients, doesn't make selection bias go away. In addition to private information, Pell recipients have varying family income, differ by age and family responsibilities, have different dependent status, and have greater or lesser access to institutions that match their needs.

Tying financial aid to a more stringent course load could work well, or it could lead to students to have lower grades and less learning, to take fewer rigorous courses, or even to drop out. We cannot tell by looking at simple correlations offered by studies of this sort.

In general, there are two types of empirical studies: descriptive and causal. Descriptive research reveals what the world looks like or how it has changed over time. Using Census data, for instance, the College Board tells us that, in 1985, only about 36 percent of graduating high school seniors from the bottom quintile of the household income distribution enrolled in some form of postsecondary education. That percentage had risen to 58 percent by 2015.

Facts like these are important in their own right, and they set the stage for deeper questions. Yet even "facts" can be difficult to interpret. For example, if a study defines low-income background as "receives a Pell Grant," then a reader won't be able to discern whether differences over time in the population of low-income students are due to changing admissions behavior or changes in Pell Grant eligibility.

That brings us to studies that claim "Event X caused Event Y to happen." Doing causal research correctly is inherently difficult because researchers must construct a counterfactual for comparison: what would have happened if Event X had not occurred. Describing a counterfactual world requires the researcher to make assumptions that are not directly testable because that world never existed.

This is why Randomized Control Trials offer the most convincing evidence of causality. For some higher education questions, researchers can create a random sample of the relevant population and then randomly assign individuals from the sample either to a "treated" group that gets, say, personalized information about college opportunities or a "control" group that does not. We can then measure the impact of the treatment and have confidence that the personalized information caused whatever effect is observed about improvements in college enrollment and outcomes.

For many vital questions -- such as whether more schooling "causes" higher income -- we can't run such an experiment. College and university review boards would take a dim view of any proposal to randomly assign high school seniors to college or to the workforce to see how their lives worked out.

That isn't a fatal problem for scholars looking for causal connections. Good research can mimic an experimental situation using data from the past. This often involves the use of so-called "natural experiments." A new college Promise program might impact one city, so researchers can test student outcomes compared to similar students in adjacent cities who were unaffected by the policy.

Another convincing strategy is to use natural thresholds or discontinuities in important formulas, such as those used by many colleges in the admissions process. Such an approach allows researchers to compare the outcomes of students who are effectively identical, but based on random chance, some fell just below the bar and were rejected, while others barely made the cut.

Research that is poorly designed, by contrast, often has no statistical methodology to pin down causal connections and no grounding in social-science theory about how people behave. This type of work proliferates in part because it is easy to produce. With today's technology, anyone who knows how to put data in Excel and interpret a t-statistic can produce a "study."

Questions to Ask

Here are some simple questions every reader can ask of any study's methods.

Does the causal arrow necessarily run in the direction the study asserts? One recent "study" argues that better broadband access might "cause" higher income in regions that have built better broadband infrastructure. On the one hand, having better internet services might indeed give low-income families better access to information and opportunities, other things equal. On the other hand, people who live in communities with above average incomes probably demand better internet services. Perhaps higher incomes "cause" better internet services.
If a study finds a correlation between two things, have the researchers omitted other variables that might explain much of that correlation? Getting an extra year of schooling is highly correlated with labor-market earnings. But labor-market earnings and how much education a person gets are both strongly correlated with that person's talents or abilities. The simple correlation between years of schooling and earnings overstates what society could achieve by pushing the average student to get more education.
"Big Data" is not a substitute for a sound research design, because it's easy to make things worse by introducing bad controls. For example, imagine you want to estimate the magnitude of gender discrimination in wages. Controlling for individuals' occupations (asking, for example, "Are you a manager?") seems reasonable on its face. But if discrimination occurs through a lack of promotions for women, then controlling for occupation will understate the true level of discrimination in the labor market.
If a study compares two groups, are the groups randomly selected? In 2018, the average SAT score in Alabama was 113 points higher than in Connecticut. That's a big difference. But this difference does not tell us anything about the quality of schools in the two states. In Alabama, only 6 percent of high school seniors self-selected to take the SAT. In Connecticut all high school juniors are required to take it.

These are but a few examples of how readers can and should probe any study's research methods. The sheer volume of low-quality research doesn't bring us any closer to understanding the important tradeoffs we face. Nor does it help us decide in any objective way which policy levers to pull. Without tools to separate the signal coming from credible research design from the noise produced by spurious correlations, we risk drowning in meaningless "results" and succumbing to the soft temptations of confirmation bias.