LucidForecast

Request Demo

Research

Can We Predict Wars and Conflicts?

Antoine Herlin · March 2026 · 12 min read

View code on GitHub

Part 1

A deceptively hard question

Predicting when and where armed conflict will break out is one of the hardest forecasting problems that exists. Wars are rare, driven by complex political dynamics, and often triggered by events that no dataset captures in advance: a military coup, a disputed election, an assassination.

We will often be wrong. That is a given. But the question worth asking is not whether we can be right every time. It is whether we can measure conflict risk well enough to make better decisions. Can a government prioritize diplomatic resources? Can a company stress-test its supply chain exposure? Can an insurer price political risk more accurately?

If so, conflict forecasting is operationally useful, even when it is imperfect.

The accuracy trap

Conflict onset is rare. It happens in about 2% of country-years. A model that simply says “no conflict” every single time achieves 97.9% accuracy. It also has zero practical value.

97.9%

accuracy by always saying “no”

of actual conflicts detected

47:1

class imbalance ratio

This is why we use proper scoring rules: AUC (can a model rank risky countries above safe ones?) and Brier scores (are probability estimates well-calibrated?) instead of raw accuracy.

Part 2

Can humans predict conflict?

It is well established that geopolitical forecasting is hard. Even trained forecasters struggle to beat simple baselines on most political questions. But can a disciplined crowd do better, specifically on war and conflict? We tested it on 238 resolved conflict questions from Metaculus.

238

resolved questions

72%

resolved NO

28%

base rate of conflict events

The scoreboard

Brier score: lower is better. 0 = perfect, 0.25 = coin flip.

Metaculus crowd

0.133

Base rate (“always 28%”)

0.202

Coin flip

0.250

Doom predictor (“always 80%”)

0.471

The cost of alarmism

Since 72% of conflict questions resolve NO, being an alarmist is the most expensive bias you can have. A pundit who says “80% chance of war” on every question scores 0.471, nearly 2x worse than flipping a coin. Just knowing how rarely conflicts happen puts you ahead of most TV pundits.

Part 3

What do statistical models see?

No ML model trained on annual structural data will be good enough for practical, real-time conflict prediction. The data is too sparse, too simplistic, and does not capture the fast-moving political events that actually trigger wars.

But models are still useful for two things: establishing base rates (which countries sit on a powder keg?) and identifying key drivers (what structural conditions make conflict more likely?).

We trained a progression of models on 165 countries from 1990 to 2023, from simple logistic regression to gradient-boosted ensembles, predicting three outcomes: conflict onset, escalation, and leadership transitions.

Conflict onset: model progression

AUC-ROC on temporal cross-validation (1999–2022). Higher = better at ranking risky vs. safe countries.

Always “no” baseline

0.500

XGBoost (63 features)

0.675

Logistic (7 features)

0.758

Random Forest (63 feat.)

0.767

Multilevel partial pooling

0.810

The best model, multilevel logistic with partial pooling, achieves AUC 0.810 with only 7 features plus regional random effects. It outperforms every tree-based model, including the 63-feature XGBoost and Random Forest. Borrowing statistical strength across regions matters more than throwing more features at the problem.

XGBoost recovers on the other tasks: AUC 0.761 on escalation and 0.730 on leadership transitions, both best-in-class. The 2% onset base rate is the specific bottleneck where ensemble tree models struggle.

What drives conflict onset?

Across all models, the same structural factors consistently emerge as top predictors:

1. Duration of peace

By far the strongest predictor. Countries at peace for 15+ years rarely see new conflict. Recent peace is fragile. This “duration dependence” dominates feature importance in every model we tested.

2. Institutional corruption

Higher corruption is consistently associated with higher onset risk. Weak institutions create the permissive environment in which grievances turn violent.

3. Ethnic exclusion

The share of the population belonging to excluded ethnic groups is a robust onset predictor, consistent with the political science literature (Cederman et al., 2010). Exclusion creates the fuel for conflict.

4. Political stability & civil liberties

Countries with low political stability scores and restricted civil liberties appear consistently in the top features across tree-based models.

The kindling vs. the spark

Structural features tell you which forests are flammable. They cannot tell you when the fire starts. Onset triggers (coups, assassinations, rapid political deterioration) are inherently unpredictable from annual data. This is a fundamental ceiling for structural models, and why the human and AI forecasters discussed next are essential complements.

Part 4

Enter the AI forecasters

A raw LLM is a terrible forecaster. When Palisade et al. (2023) entered GPT-4 in a Metaculus tournament against 843 humans, its forecasts were statistically indistinguishable from a coin flip.

But give an LLM access to real-time information, structured reasoning, and calibration checks, and everything changes. We built an AI forecasting system and entered it in live Metaculus competitions. The results below were produced in real time, on unresolved questions, with no hindsight. We then evaluated them on 50 conflict questions with known outcomes.

AI vs. the crowd: head to head

AI Bot

0.134

Brier score

Metaculus Crowd

0.143

Brier score

Both

84%

Directional accuracy

Where AI beats the crowd

AI wins when crowds get caught up in dramatic scenarios. The crowd overweighted spectacular possibilities; the AI anchored on base rates and structural constraints.

“Will the Crimean Bridge be hit before Sept 2024?”

AI: 31%Crowd: 65%Resolved NO

“Will the ICC issue a warrant for Gallant before Sept 2024?”

AI: 24%Crowd: 55%Resolved NO

“Will Russia control Chasiv Yar on Oct 1, 2024?”

AI: 35%Crowd: 60%Resolved NO

“Ukraine Kursk offensive reaching other oblasts?”

AI: 14%Crowd: 47%Resolved NO

“Will Iran carry out a deadly attack within Israel before Sept 2024?”

AI: 34%Crowd: 48%Resolved NO

Where AI gets it wrong

AI loses when it assigns too much probability to events the crowd correctly dismisses. It has a “probability floor” problem: where humans confidently say 5-10%, the AI hesitates and says 20-35%.

“Will there be a large-scale armed conflict in Russia before Jan 2025?”

AI: 31%Crowd: 10%Resolved NO

Could not confidently rule out an unlikely scenario

“Will Donald Trump visit Russia before July 2025?”

AI: 29%Crowd: 8%Resolved NO

Overweighted surface plausibility of diplomatic signals

“Will a new nuclear-armed state emerge before Sept 2024?”

AI: 22%Crowd: 7%Resolved NO

Found alarming articles on Iran's program, overcorrected

“Will Iran announce a new capital location before Jan 2026?”

AI: 33%Crowd: 15%Resolved NO

Confused theoretical risk (earthquake) with actionable policy

AI excels at

+ Deflating dramatic scenarios that sound plausible but rarely happen
+ Anchoring on base rates when emotions run high
+ Weighing structural constraints over narrative momentum

AI struggles with

− Confidently ruling out unlikely events (stays at 20-35% when the answer is closer to 5%)
− Separating what is theoretically possible from what is practically likely

Conclusions

What this means in practice

AI is closing the gap with the best humans

Our AI system matches the Metaculus crowd, a group that already outperforms Tetlock's superforecasters on conflict questions. The gap between machine and human is narrowing fast, and the combination of both outperforms either alone.

Scale changes the game

Humans produce world-class forecasts on the 50 questions that a curated crowd happens to cover. But what about the 50,000 niche questions nobody is watching? AI can handle complex, specific questions at scale and react within hours, not weeks, to new information. The real opportunity is not replacing top forecasters, but covering the vast space of questions they never get to.

A new way to handle uncertainty

The most valuable output of this work is not a binary “war or no war” call. It is calibrated probabilities with transparent reasoning. When a model says 12% risk and explains it through peace duration, corruption levels and ethnic exclusion, decision-makers get something they can actually work with, even when the model is uncertain.

None of these alone is enough

Statistical models identify the structural kindling. Human crowds calibrate the near-term probability. AI fills the gaps, covering niche questions, anchoring on base rates, and processing information faster. The frontier is not a single model. It is a system that combines all three.

This research was produced for Boris Najman's course “Economics of Crisis and War” at Université Paris-Est Créteil. The full analysis covers 165 countries, 30+ years, 238 crowd-forecasting questions, and 50 AI predictions.

Full code and data on GitHub

Want to explore how AI-powered forecasting can support your decisions?

Talk to us