You have /5 articles left.
Sign up for a free account or log in.

Hilch/iStock/Getty Images Plus
The data revolution continues to transform academia. New research venues showcase this shift, from conferences like the International Conference on Computational Social Science and Words in Numbers to journals like Digital Humanities Quarterly and Nature Human Behavior. Research funding has helped fuel this transformation, with programs such as the National Science Foundation’s Harnessing the Data Revolution and the Moore Foundation’s Data-Driven Discovery Initiative investing hundreds of millions of dollars in advancing computational and data-intensive research.
Colleges and universities are investing heavily in data science education, launching dozens, if not hundreds, of new undergraduate programs in recent years. Even university administrators increasingly rely on data analytics to guide decision-making. But as academia embraces this data revolution against a backdrop of attacks on justice, a crucial question remains: Whom are we missing in our analyses?
When researchers overlook certain populations in their data sets, they don’t just miss information—they miss people. Consider a striking example from our recent research on Superfund, a federal program aimed at cleaning up the nation’s most contaminated land. Previous studies examining racial disparities in environmental remediation typically focused on Black and Hispanic communities or used aggregated “minority” categories. By expanding our analysis to explicitly include Asian populations—data that was available but traditionally excluded—we uncovered a troubling pattern: Areas with higher proportions of Asian residents were significantly less likely to see cleanup efforts completed. This disparity had remained hidden because researchers hadn’t thought to look for it.
Similarly, our analysis of New York City’s jail system during COVID-19 revealed how oversimplified data categories can mask critical inequities. The city’s public incarceration data uses just three racial categories—“Black,” “Asian” and “other”—obscuring the experiences of many racial and ethnic groups and limiting our ability to address disparities in the criminal legal system.
In our current political climate, data inclusion has become particularly urgent as marginalized communities face unprecedented attacks. Federal policy changes and executive orders have rolled back long-standing protections for environmental justice, nondiscriminatory hiring and transgender rights, including by redefining legal interpretations of gender.
At the state level, legislative efforts have further intensified these challenges. Women’s reproductive health-care access has been severely restricted across the nation. More than half the states have banned gender-affirming care for transgender youth, and many have passed laws limiting discussions of race in education. Meanwhile, disability rights advocates warn of escalating threats to accessibility and inclusion.
These challenges are particularly acute in higher education, where increasingly diverse student bodies face mounting obstacles to success. Restrictions on discussing race, gender and identity limit students’ ability to fully engage in their education. Limited data collection and oversimplified demographic categories can mask the unique challenges faced by different student groups, from access to health care and support services to experiences of discrimination and exclusion. As these personal burdens grow and resources dwindle, the need for a more nuanced understanding of student experiences becomes increasingly critical for developing effective support systems and policies.
In response to the challenges off campus and on, organizations are demonstrating the power of inclusive data collection. Trans Equality’s U.S. Trans Survey has become a gold standard for community-led data collection, providing crucial evidence of health-care disparities and discrimination. The Disability Data Dashboard at Brandeis University captures the experiences of disabled Americans often missed in federal surveys. The Williams Institute at the University of California, Los Angles, combines rigorous data collection with legal scholarship to document LGBTQ+ experiences and inform policy. Meanwhile, Data for Black Lives shows how community-controlled data can advance racial justice, and the National Women’s Law Center maintains data tracking gender disparities in employment, housing, health care and more.
In higher education, what can we do? Drawing on quantitative critical theory—a framework pioneered by scholars including William Tate and Gloria Ladson-Billings in the 1990s and advanced by researchers like Tara Yosso, Daniel Solórzano and David Gillborn—we can examine how seemingly neutral technical choices in data collection and analysis can perpetuate or challenge systemic inequities.
First, question your defaults. Before accepting standard demographic categories or traditional data collection methods, ask yourself: Whom might these categories exclude? Consider how binary gender categories erase nonbinary experiences, or how racial categories might oversimplify multiracial identities. For instance, the Gender Identity in U.S. Surveillance group, a multidisciplinary group of experts convened by UCLA’s Williams Institute, has developed comprehensive guidelines for improving measures of gender identity in population-based surveys.
Second, engage with affected communities. Data categories should reflect how people identify themselves, not just administrative convenience. Researchers can partner with community organizations and advocacy groups to understand how their data choices might impact different populations. This approach can reveal blind spots in your methodology and help ensure your research serves the communities it studies.
Third, document your data choices transparently. When publishing research, explicitly discuss why you chose particular categories or exclusion criteria. This helps readers understand potential limitations and encourages other researchers to consider these issues in their own work. In our research on Rikers Island, we highlighted how the limited racial categories in New York City’s data system constrained our analysis and called for more nuanced data collection. We even wrote an op-ed about it.
Fourth, consider multiple analytical approaches. Different methods of categorizing and analyzing data can reveal different patterns. Our environmental justice research used multiple statistical models and geographic scales to ensure our findings weren’t artifacts of a particular analytical choice, revealing disparities that might otherwise have remained hidden.
Finally, embrace complexity. While simplified categories might make analysis easier, they can hide important nuances. For instance, both “Asian” and “Hispanic” categories encompass diverse populations with varying experiences and outcomes. When possible, use more detailed subcategories and acknowledge the limitations when you can’t.
These steps aren’t just technical adjustments—they’re fundamental to conducting ethical, rigorous research that serves all communities. As higher education grapples with questions of equity and inclusion, we must recognize that data practices are not neutral. They reflect and potentially reinforce existing social hierarchies and biases.
Department chairs, deans and research leaders should build institutional support for more inclusive data practices. This support could include developing guidelines for demographic data collection, providing resources for community engagement and more comprehensive data analysis, and recognizing the additional effort required for these inclusive measures in tenure and promotion decisions.
The stakes are high. When we fail to count certain populations or oversimplify their experiences in our data, we risk perpetuating their marginalization in policy and practice. As researchers and teachers, we have a responsibility to ensure our work illuminates rather than obscures, includes rather than excludes.
Our data choices matter because people matter. Academia must make these choices more thoughtfully and inclusively.