Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are by Seth Stephens-Davidowitz
Published in May of 2017.
Should every social scientist become a data scientist?
Is big data really changing everything? Economist and former Google data scientist Seth Stephens-Davidowitz thinks so. Not only is big data altering the landscape of advertising (Google and Facebook) and retail (Amazon), big data has the potential to revolutionize the social sciences.
In Everybody Lies, Stephens-Davidowitz makes the claim that:
“The next Foucault will be a data scientist. The next Freud will be a data scientist. The next Marx will be a data scientist.”
Everybody Lies is an engaging book length argument that we are on the cusp of Kuhnian paradigm shift in social science research methods. No longer will academics rely on small samples and unreliable attitudinal reporting to generalize their findings to the larger population. Instead, a new breed of data science trained social scientists will leverage the power of search engine data to measure preferences and explain behavior.
And what will these future data/social scientists be researching? Mostly, to quote Mick Jagger, "sex and sex and sex and sex." Much of Everybody Lies is an examination of the porn preferences of populations by region, gender, age, race, and political persuasion - as measured through a combination of analysis of Google and PornHub data.
Stephens-Davidowitz also looks at the extent of actual racism and sexism in the US, as opposed to reported racism and sexism from surveys, and discovers through his Google analysis that the results of the last presidential election were perhaps more predictable than polling predicted.
My dissertation was on the working poor, and my study of this groups sexual proclivities was restricted to their total fertility rates. (I trained as a social demographer). The data that I tortured were restricted to the comparatively small (at least compared to Google size data) of around 9,000 respondents. Imagine if I would have had access to Google and PornHub data back in the mid 1990s. I could have researched not only the relationship between low-wages and family formation, but the porn surfing behavior of minimum wage earners.
The question that I kept wondering while reading Everybody Lies - which by the way is fascinating and probably a must read for our analytics obsessed generation of academics - is what sorts of social questions can we actually interrogate with big data? Going back to my dissertation example, I’m not sure if there is any utility in knowing the surfing habits of the working poor. What I wanted to know was what would be the impact on family formation on the growing numbers of low-wage workers in the U.S. The data that I needed was about wages, marriage and cohabitation, and childbearing. None of these data are internet search data.
Everybody Lies makes a good case that big data analytical techniques should be taught in social science graduate programs. That new PhD’s should be as facile in analyzing the tens of millions of records in a Google data set as they are in teasing out conclusions from survey data. Further, Stephens-Davidowitz argues that the new social scientists need to become as comfortable in conducting web-based A/B experiments as they are in running small-scale behavioral experiments on undergraduates.
Maybe. I’m still trying to get my head around what it would have been like to have to learn data science while studying Durkheim, Weber, and Marx in grad school.
Are you seeing a change in graduate school training away from traditional survey analysis and towards training in big data analysis?
What are you reading?