Why social science needs to embrace machine learning

machine_learning

Quick note: This is just the first post in a series; think of it as a short introduction.

I’m a social scientist, and unlike many in my field, I genuinely enjoy math and statistics. My friend Petar jokes that statistics isn’t real math, but I’ll leave that debate for another time.

What matters is that social science has long struggled to explain society’s complexity. We rely heavily on methods like surveys, interviews, and content analysis. These are useful but often fall short when predicting or uncovering causal relationships. Society is messy, and humans are unpredictable.

Yet, I believe we can do better if we embrace the power of machine learning (ML).

Machine learning is a branch of computer science that focuses on training machines to learn from data and perform specific tasks based on that learning.

In social science, that means building models that can, for example:

Identify latent groups based on attitudes,
Predict behaviors,
Or uncover hidden patterns in massive datasets.

Want to know what all people who vote for Party X have in common? Or whether education or family connections matter more for upward mobility? ML can help us get closer to those answers.

Here are some great examples of ML being used in sociology:

We can also turn to Natural Language Processing (NLP) to analyze text data, such as public sentiment on social media. What used to take weeks of manual coding can now be done in minutes (plus preparing the data) using topic modeling algorithms. Social networks have given us unprecedented access to public opinion. It’s time we used that data intelligently.

In recent weeks, I’ve been working with European Social Survey data (more on that soon), and I’ve seen firsthand how models like Random Forest and XGBoost can help answer long-standing questions. For example, what better predicts someone’s trust in the police—their general trust in others or their occupation?

Machine learning helps us explore questions we’ve only scratched the surface of.

I also recently completed an NLP project analyzing speeches from the Serbian parliament, which produced interesting results.

But There Are Three Big Problems

Even though ML can benefit social science, we first need to learn how to do it and right now, we face three serious obstacles:

1. Fear of Math and Technology

Many students study social science, thinking they can avoid math or technical skills. I understand it’s not something our field has traditionally emphasized. But if we want to stay relevant, that has to change. Math isn’t a talent you’re born with or something you “missed the boat on.” You can still learn it for free.

2. Low Computer Literacy

As a teaching assistant, I’ve noticed many students struggle with basic computer tasks, such as writing essays, making presentations, or navigating research tools. In my methodology classes, poor digital literacy has been a consistent obstacle. That needs urgent attention.

3. Access to Data

To build applicable ML models, we need lots of quality data. Getting that data isn’t always easy. Our best resources are public datasets like government records or large-scale social surveys. That’s a great place to start, but we need to build a culture of data usage first.

Where Do We Go From Here?

We need to teach coding.

We need to embrace data.

We need to think beyond averages and p-values.

The world is changing fast. Our methods need to catch up.

So, step out of your comfort zone and play with data.

Start small, but start now.

The future of social science depends on it.

But There Are Three Big Problems#

1. Fear of Math and Technology#

2. Low Computer Literacy#

3. Access to Data#

Where Do We Go From Here?#

But There Are Three Big Problems

1. Fear of Math and Technology

2. Low Computer Literacy

3. Access to Data

Where Do We Go From Here?