Publications

Published

Extractive versus Generative Language Models for Political Conflict Text Classification with P.T. Brandt, S. Alsarra, V. D’Orazio, D. Heintze, L. Khan, J. Osorio & M. Sianan Political Analysis, 2025 Link | Slides | Code We develop specialized language models for political conflict analysis that outperform general-purpose LLMs like Gemma 2, Llama 3.1, and Qwen 2.5 in accuracy, precision, and recall while being hundreds of times faster.

ConflLlama: Domain-Specific Adaptation of Large Language Models for Conflict Event Classification with P.T. Brandt Research & Politics, 2025 Link | Slides | Code We introduce ConflLlama, a specialized variant of Llama 3.1 fine-tuned for political conflict classification, demonstrating superior performance in event coding and conflict analysis tasks compared to traditional approaches.

Public Health Advocacy in Times of Pandemic: An Analysis of the Medicare-For-All Debate on Twitter During COVID-19 with S. Kumar & P. Zhang Behavioral Sciences, 2025 Link | Code We analyze how health advocacy groups adapted their Medicare-For-All messaging on Twitter during the COVID-19 pandemic, revealing distinct approaches to public engagement and narrative adaptation.

Under Review

Strategic Anticipation and Expert Review: The Determinants of Critical Advisory Opinions from the Dutch Council of State with C. Egger & A. Zhelyazkova Drawing on 2,898 advisory opinions issued by the Dutch Council of State (2004–2025), comprising 2,111 opinions on laws and 785 on executive decrees, we test five competing mechanisms. Instrument type is the strongest predictor of severity: executive decrees receive dramatically milder treatment (69% drawing no objections versus 13% for laws). Among laws, junior coalition partners attract significantly less critical opinions than leading parties, consistent with anticipatory compliance. Ideological distance from the cabinet median is positively associated with severity, though the effect is modest.

Framework Laws and Fading Oversight: How the Dutch Executive Expanded Its Lawmaking Authority with C. Egger & A. Zhelyazkova Examining executive predominance in the Netherlands using a legislative dataset spanning 2000–2025. Roughly 75% of legislation now takes the form of secondary executive acts, while MPs retreat toward weaker oversight tools such as motions and parliamentary questions.

Electoral Accountability and Government Content Removal: Theory and Evidence from Google Transparency Reports with P. Zhang How does regime type shape a state’s intervention in its information environment? We develop a political agency model showing that democratic governments refrain from direct takedowns and delegate to courts because electoral accountability disciplines politicians through reputation-building incentives. Using Google transparency reports and quasi-experimental variation in election timing, we show that takedown requests from democratic governments decline significantly as elections approach, a reputational discipline effect absent in authoritarian regimes and other request types.

Two Types of Censorship? An Assessment of the Informational Autocracy Thesis with P. Zhang Testing how Guriev & Treisman’s Informational Autocracy theory applies to internet filtering practices in autocratic nations.

Build, Borrow, or Just Fine-Tune? A Political Scientist’s Guide to Choosing NLP Models Draft [PDF] | Code A practical decision framework for political scientists choosing among domain-pretrained, fine-tuned, and commercial LLM approaches for text classification. Fine-tuning general-purpose encoders matches domain-pretrained alternatives and outperforms commercial APIs at a fraction of the cost.

Working Papers

Using Messy Text in Future LLM Annotations with P.T. Brandt, P. Cuellar-Cuellar & X. Zhao When working with real-world text, researchers often inherit corpora and annotations with costly human judgments but were not collected, organized, or annotated for modern text-as-data workflows. We propose codebook attention: a codebook-guided turn-by-turn extractive summarization pipeline that reuses valuable annotations without requiring full re-annotation. For each victim in a corpus of disappearance reports from Mexico, the model processes documents sequentially, extracts verbatim evidence spans for each codebook category, and updates a running structured summary. We evaluate the resulting summaries by classifying and comparing predicted labels with human annotations across multiple open-source models including Llama 3.1, Gemma 3, and Ministral 3. Results suggest that hours of computation can approximate weeks of human coding effort while producing evidence-traceable summaries suitable for downstream human rights and event data research.

The Attention Trap: How International Celebrity Intervention Displaces Policy Discourse in Protest Movements Draft [PDF] Does international celebrity attention help or hinder social movements? Analysing 1.08 million tweets from the 2020–2021 Indian farmer protests, I measure the effect of celebrity attention shocks on protest discourse. When Rihanna tweeted about the protests on 2 February 2021, daily volume rose 22-fold, but the share of tweets discussing policy issues fell by 10 percentage points while celebrity and meta-commentary rose by 45 percentage points. I call this the attention trap: celebrity intervention maximises visibility while displacing substantive discourse. Local projection impulse responses show this displacement is specific to the celebrity channel; domestic events, including state violence at Lakhimpur Kheri, generate policy-focused responses without comparable displacement. The cost was strategically exploitable: the BJP launched a coordinated nationalist counter-campaign within hours, on discursive terrain made favourable by the displacement itself.

Legislating Without Scrutiny: Executive Aggrandizement and Democratic Erosion in India Draft [PDF] Committee referral rates for legislation collapsed from 73% to 6% under India’s NDA government, and median passage time fell from 268 to 11 days. Combining text analysis of 881 central acts with parliamentary data on 1,008 bills and synthetic control methods, I document systematic executive aggrandizement operating through procedural bypass rather than constitutional change. All eight V-Dem democracy indicators declined significantly after 2014, corroborated by a synthetic control showing judicial constraints fell relative to comparable democracies (ATT = -0.051, p = 0.024).

Populism and Investment Treaties with C. Peinhardt Investigating how nationalist rhetoric influences countries’ decisions to exit international investment agreements.

IGO Withdrawal Networks Extending Borzyskowski & Vabulas (2019) to examine network effects in international organization withdrawals.

Media Polarization in International News Coverage with A. Khalid & K. Park Using machine learning to analyze ideological patterns in international news coverage across 200,000+ articles.

Event Horizon: Revolutionizing Data Annotation with Reinforcement Learning Model A novel reinforcement learning framework built on DeepSeek’s GRPO, merging political science with advanced AI to create structured, transparent annotations for complex events.

Dissertation

Digital Sovereignty: The Political Economy of Internet Governance University of Texas at Dallas, 2025 Slides