Currently

Active Work

Research in progress at ZAS Berlin, at the intersection of LLM evaluation and computational pragmatics.

Accepted · CMCL 2026
LLM Calibration Metrics for Social Meaning Tasks
Developing two novel evaluation metrics — ESR (Evaluation Score Ratio) and CDS (Calibration Distance Score) — to assess how well frontier LLMs predict human social evaluations. Testing GPT-4, Claude, and Gemini across politeness, precision, and register tasks under varied prompting conditions.
LLM evaluation calibration social meaning GPT · Claude · Gemini
→ Accepted at CMCL 2026 · follow-up targeting ACL/EMNLP
In Progress
Benchmarking LLMs Against Human Pragmatic Judgments
Building a benchmark of fine-grained pragmatic evaluation tasks — politeness, self-presentation, register choices — drawn from controlled human experiments. Goal: assess whether LLMs exhibit human-like calibration on social meaning or systematically diverge.
benchmarking pragmatics human evaluation NLP
→ Targeting ACL / EMNLP
In Progress
Probabilistic Speaker Models of Social Meaning
Extending Rational Speech Act (RSA) and game-theoretic frameworks to model how speakers make strategic linguistic choices — including (im)precision, politeness, and register — and how these choices convey social meaning beyond literal content.
speaker modeling RSA pragmatics register
→ Part of SFB 1412 Register project at ZAS Berlin
In Progress
Strategic Choices in Indirect Requests
Modeling the pragmatic mechanisms behind indirect speech acts — when and why speakers choose indirect over direct requests, and what this reveals about the interplay of politeness, social context, and communicative efficiency.
indirect speech acts game theory pragmatics
→ Presented at Leibniz MMS Days 2025
Across 15+ Years

Research Themes

Recurring questions that have shaped my work across projects, methods, and institutions.

⚖️
Core Theme
LLM Evaluation & Calibration
Do large language models capture social meaning the way humans do? I develop metrics and benchmarks to test LLM calibration on fine-grained pragmatic tasks — politeness, precision, register — using human judgments as ground truth.
ESR · CDS metrics · CMCL 2026 (accepted)
💬
Core Theme
Pragmatics & Social Meaning
How do speakers use language strategically — choosing words not just for content but for social effect? I model politeness, (im)precision, and register as the outcomes of rational communicative choices with social consequences.
Linguistics & Philosophy 2021 · Linguistics Vanguard 2022 · ELM3 2024
🔁
Core Theme
Language Dynamics & Evolution
How do linguistic conventions emerge, spread, and change across populations? I use agent-based simulations and evolutionary game theory to model language change, grammaticalization, and the evolution of semantic ambiguity.
Synthese 2021 · Morphology 2017 · AI & Society 2019
🕸️
Core Theme
Signaling & Social Networks
How do signaling conventions form and propagate through social networks? I study the interplay of local interaction, network structure, and population-level behavior using multi-agent models and network theory.
Games 2022 · AI & Society 2019 · Ph.D. Thesis 2013
🎮
Core Theme
Game Theory & Formal Modeling
Game-theoretic frameworks — signaling games, evolutionary dynamics, reinforcement learning — provide precise tools for modeling the strategic dimensions of communication and the emergence of linguistic conventions.
Experimental Economics 2022 · Games 2019 · Fundamenta Informaticae 2018
🧪
Core Theme
Experimental & Empirical Methods
Formal models need empirical grounding. I design behavioral experiments (LabVanced, Prolific) and apply statistical modeling (Python, R, mixed-effects) to test theoretical predictions against human data.
ELM 2024 · Social Meaning Berlin 2023 · EvoSAL 2019–2021
Project History

Past Projects

Funded research projects and institutional affiliations that shaped the current work.

2020 – present DFG · SFB 1412
Modeling Meaning-Driven Register Variation
As Senior Researcher in the collaborative research centre SFB 1412 "Register", I develop probabilistic models of register variation — investigating how speakers adapt their language choices to social context, and what this tells us about the strategic and social dimensions of meaning.
2019 – 2021 NAWA · €75K
EvoSAL: The Evolution of Semantic Ambiguity in the Lab
As PI of this NAWA Ulam-funded project, I developed iterated learning models of semantic ambiguity and tested predictions experimentally. The computational models outperformed established theoretical baselines, with results published in Synthese (2021) and Games (2022).
2017 – 2019 EU H2020
ODYCCEUS: Opinion Dynamics & Cultural Conflicts
As postdoctoral researcher in this Horizon 2020 project, I designed reinforcement learning models combining cognitive frames and game-theoretic dynamics to predict cooperation in multi-agent interactions — introducing a new RL paradigm that outperformed existing models across five key metrics.
2013 – 2017 ERC Advanced
EVOLAEMP: Language Evolution — The Empirical Turn
As postdoctoral researcher in this ERC Advanced Grant project, I developed computational sociolinguistic models using agent-based simulation and network analysis. This period included visiting research stays at Stanford, UC Irvine, Ohio State, and Toulouse.
// Computational Methods
LLM evaluation — calibration metrics, prompt engineering, human-model comparison
Deep learning — PyTorch, HuggingFace Transformers, fine-tuning
Agent-based modeling — multi-agent simulations, evolutionary dynamics, NetLogo
Reinforcement learning — fictitious play, replicator dynamics, multi-agent RL
Statistical modeling — Bayesian inference, mixed-effects models, Python/R
// Theoretical Frameworks
Game-theoretic pragmatics — signaling games, RSA, strategic communication
Evolutionary linguistics — population dynamics, language change, iterated learning
Formal semantics — truth conditions, scalar implicature, register theory
Network theory — social networks, convention propagation, small-world models
Experimental pragmatics — behavioral experiments, LabVanced, Prolific