2026-05-22alignmentdatacode

It's the humans, not the data: Geopolitical bias in LLMs originates in post-training, amplified by the language of the prompt

Stuart Bladon, Brinnae Bent

Key claim

Geopolitical bias originates from post-training, not pre-training.

The research indicates that geopolitical biases in language models are primarily influenced by post-training rather than pre-training. Notably, the model from Alibaba showed a significant shift in bias towards China after post-training, emphasizing the importance of oversight in model alignment processes.

Novelty

8.0/10

The paper reveals that geopolitical bias is shaped during post-training, challenging prior assumptions.

Reliability

7.5/10

The study uses a robust methodology with multiple models and languages, although specifics on the evaluation metrics could be clearer.

Deep reliability assessment

The methodology supports the claim that geopolitical bias is introduced during post-training rather than pre-training, but the generalization across all labs is limited by the small sample size and the specific models tested.

Reproducibility

Yes, the paper mentions that code and scenario bank are available at https://github.com/recozers/LLM-Bias.

Discussion questions

How might the findings change if larger or closed models were included in the study?
What are the implications of these findings for developers aiming to create unbiased language models?
What specific evidence would be needed to refute the claim that post-training introduces geopolitical bias?

Key figure

Figure 1 shows the change in China-favourability scores from base to post-trained models across different labs, highlighting the shift in bias direction.

GitHub1 repo

recozers/LLM-BiasOfficial

Read on arXiv →