Can Artificial Intelligence Fix Social Science?

Concrete Evidence (March 23, 2026)

Mar 23, 2026

A square object with a purple light coming out of it — Courtesy Milad Fakurian/Unsplash.

Artificial intelligence can do a lot of things that social scientists do. It can analyze data, write computer code, review existing code for errors, point the way to appropriate statistical methods for the question at hand, and give suggestions on study drafts. It can even start with a dataset and a research question and write a whole paper by itself. Given that human-led social science is a minefield of mistakes, dubious methods, ideological bias, and even outright fraud, one can hope that AI will make things much better in the future

Some recent studies, though, highlight the limitations of current models. For the time being, AI is certainly a productivity- and quality-enhancing tool, but it’s not a panacea for what ails social science or a reason to let one’s guard down.

Perhaps the biggest social-science story of the week centers around, oddly enough, a 2021 study about the impact of tech clusters—cities with large numbers of inventors working in a given field—on innovation. A charity found this study helpful in its funding decisions and hired the economist Michael Wiebe to extend it. Digging into the materials, however, Wiebe found a series of technical and coding mistakes that, when corrected, undermined the study’s conclusions. The American Economic Review (which also published the original study) has accepted Wiebe’s comment detailing these issues.

More importantly for our purposes, Wiebe also tried feeding the materials into AI chatbots—two versions of ChatGPT and also Refine, a special tool calibrated to improving academic work—asking them to scrutinize each key result to see if they’d detect these problems as well.

On the one hand, the bots were helpful. They all caught some issues, including a key coding error. Those working in social science should definitely be running their papers and code through AI and investigating any problems it unearths. It takes very little time.

On the other hand, the AIs missed a lot of problems too. And Wiebe notes that he was not testing for false positives—cases where the AI found a problem that didn’t exist. On that issue, another new study finds that AI “editing” of text often distorts the meaning. Personally, I can attest to having been falsely accused of a coding error by ChatGPT.

The lesson here: AI “peer review” can improve papers, yet you can’t necessarily trust a paper that’s been AI-vetted.

But what if we take the human-written paper and code out of the question entirely, and simply give AIs the data and research question? If AIs can consistently find the best methods, apply them, and reach the correct conclusion, that would be a huge advantage. That’s especially so because human research teams can do markedly different things even when applying the same data to the same question, a reality that stems from both ideological bias and simple differences in methodological choices.

Unfortunately, still another new study finds the same phenomenon in AIs. Working with 150 Claude Code agents from the Sonnet 4.6 and Opus 4.6 families, the researchers provided data from the New York Stock Exchange and asked the agents to answer various questions, such as whether daily trading volume, intra-day volatility, and the price impact of trades changed over time.

Some of these questions produced little variation in results. But others produced huge differences driven by subtle choices. For example, “trading volume” can be interpreted as dollar volume or share volume, which produced results in the opposite direction. Similarly, the change in volatility depends heavily on whether raw or proportional changes are measured. Strikingly, different versions of Claude even had different “empirical styles,” leaning toward certain modeling approaches and ways to measure variables (such as daily vs. monthly).

Offering the AIs a peer review (from another AI) prompted some revisions, but no overall convergence toward similar results. However, when the researchers provided examples of top papers studying similar questions, the AIs often did imitate those methods and converge to similar results.

In other words, like humans, AIs will branch out and do things differently unless you nudge them all onto the same path. That might be helpful if you actually know the right path, but it’s clearly a big limitation if you want good results from AI without extensive steering from humans. After all, human social scientists’ fallibility and bias are why we’re so hopeful about AI to begin with!

And speaking of human bias, yet more new research traced the ideological orientation of academic research since 1960 (using AI, naturally, to classify the articles’ political valence). It concluded that “roughly 90 percent of politically relevant social science articles leaned left” during this period; that every discipline leaned left on average; and that every discipline has also moved left since 1990.

Such findings highlight the need for social science to come from a wider variety of perspectives, and with careful prompting, AI might serve that purpose. Yet they also reveal an obstacle: AI is trained on the existing body of human writing, including biased science, to begin with. As Manhattan Institute reports from David Rozado have demonstrated, AI models often have left-leaning ideological priors and exhibit other biases as well, such as tending to pick the first of two options given.

AI can certainly find mistakes in humans’ work, and it can certainly pump out a lot of decent code and writing very quickly. Those are huge advantages that we shouldn’t minimize.

But for now, at least, the technology makes plenty of mistakes and carries ideological baggage of its own, and it doesn’t converge to consistent results when different bots are given the same question—unless heavily steered by the very humans whose foibles we’re trying to avoid.

From the Manhattan Institute

Daniel Di Martino has developed a really neat immigration calculator—decide what the rules should be, and check out the effect on the debt.
Josh Appel says New York City should pay for results when it comes to buying services from nonprofits.

Other Works of Note

A new tax in Massachusetts may also be driving people away.
How cultural changes helped humans survive in such a wide variety of environments.
Big new dataset—or rather an old one, taken off microfiche!—starting with tens of thousands of pregnancies that occurred from 1959 to 1974.
Some data on non-citizen welfare use. And what shapes how black Americans see immigration?
“I first document that Americans view immigrants as future Democrats who are culturally right-wing and economically left-wing. I then demonstrate that Americans’ receptiveness to immigrants, as well as judgments about their legal status and deservingness, are highly sensitive to whether newcomers are potential partisan allies or adversaries.”
Analyzing millions of public statements to measure how politicians use divisive rhetoric.
Which types of local police departments do the most immigration enforcement?
Measuring “shooting-free days” across cities.
Body cameras increase the chance that “discourtesy and offensive language” allegations against cops are substantiated.
Disorder rises when the college kids are back in town.
Why do people want guns, and to what extent are gun owners open to non-lethal alternatives?
“While state-level variation in firearm ownership may impact fatal police shooting rates, it is unlikely that higher firearm ownership rates explain why some states have such large racial/ethnic disparities compared to others.”
Statistical risk assessments are a good way to decide who is let out while awaiting trial.
Going from 10 to 20 and from 20 to 40 are both increases of 100%, but what’s the percentage increase if we go from 0 to 10? When trying to analyze variables in proportional terms, people try all sorts of hacks to get around that impossible question, the choice of which can hugely change their results. A new study suggests they happen to land on solutions in the “sweet spot” where their results look strongest.
Obstacles for the abundance YIMBYs: This paper, popular on the left, argues that bringing down housing prices by deregulation would be too slow. Meanwhile, Pew research finds that most Americans prefer big houses in spread-out communities.
Should the Make America Healthy Again movement take its cues from the war on tobacco?
Does grade inflation reduce students’ future earnings?
New work on how school choice can increase integration.
Do employers respond to higher minimum wages by demanding more effort, increasing workplace injuries?
Big RAND Corporation project to map out the tax code.

Speaking of the tax code, you should probably start working on your taxes if you haven’t already. Have a good week!

Discussion about this post

Ready for more?