Research

Welcome to my research page. Here, you’ll find a selection of my work on various topics within political science and computational linguistics.

Working Papers

Democracy and Internet Control: Theory and Evidence from Transparency Reports (with Pengfei Zhang)

Internet control has long been considered a feature of authoritarian regimes alone. Drawing data from Google and Twitter transparency reports, we observe that democratic countries remove an equal amount of content as their authoritarian counterparts. The distinction between the two regimes lies not in the quantity but in the method of content removal. Democracy refrains from government takedown and instead delegates the removal right to the users. This paper conjectures that politicians’ reputation concern is the key to understanding this phenomenon. To that end, we develop a political agency model that explains the stylized facts and derives testable hypotheses. Using the timing of elections as a natural experiment, we provide supporting evidence that the takedown requests from democratic governments decreased significantly as the election approached. This reputation effect is not observed in authoritarian regimes or other types of requests.

Two Types of Censorship? An Assessment of the Informational Autocracy Thesis in the Online Space

This study tests how Guriev & Treisman’s Informational Autocracy (IA) theory applies to internet filtering practices in autocratic nations. It examines the different strategies of online censorship employed by autocratic regimes, analyzing how these practices align with or deviate from the predictions of the IA theory.

How do Interest Groups adapt their Communication Strategy to Big Shocks? An analysis of the Medicare-For-All Debate on Twitter during COVID-19. (With Sushant Kumar & Pengfei Zhang)

COVID-19 has reinvigorated the policy debate for a universal healthcare system, attracting much attention on social media. In this paper, we study the online discourse of Medicare-For-All before and after COVID-19 by examining the Twitter feeds of two opposing health advocacy groups – Physicians for a National Health Program (PNHP) and Partnership for America’s Healthcare Future (P4AHCF). Our empirical results show a sharp contrast between the two interest groups’ communication strategies. PNHP’s tweets show more personalized stories, whereas P4AHCF’s tweets show more statistics and scientific reports. The difference in text styles is consequential. PNHP has higher engagement of Twitter users and is more adaptive to a pandemic narrative. By contrast, P4AHCF stopped tweeting entirely about Medicare-For-All after COVID-19 was declared a pandemic. We argue that the distinctive social media strategies can be explained by the groups’ different audiences and objectives. The findings add to our understanding of American’s activism on social media and the implication of the pandemic for health policy reform.

Polarization within U.S. Foreign Media (With Arslan Khalid & Kiwan Park)

In today’s highly polarized media landscape, understanding the potential biases and relationships between news publications has become increasingly important. This study aims to investigate the existence of media bias by examining the content of online news articles. We leverage a dataset containing over 200,000 articles from various media outlets, spanning diverse political orientations and subject matter. Using unsupervised machine learning techniques, we first employ clustering algorithms, such as k-means and hierarchical clustering, to group similar articles based on their content. This analysis enables us to identify patterns and common themes within the clusters, shedding light on the potential ideological leanings of the publications. Our DID analysis shows that, on average, there is a slight negative shift in sentiment scores before and after the 2016 US presidential election for articles published by liberal-leaning publications compared to those published by conservative-leaning publications. The regression analysis indicates that the ideological leaning of the publication was significantly associated with the overall sentiment score of articles, with liberal-leaning publications having higher sentiment scores on average compared to conservative-leaning publications.

Introducing the ConfliBERT Family of Language Models for Political Science (with Patrick Brandt, Vito D’Orazio & Javier Osorio)

For decades, conflict scholars used rule-based approaches to extract information about political violence from newspapers around the world. Recent technological development in Natural Language Processing allowed us to overcome the rigidity of rule-based approaches. We review our recent ConfliBERT language model (Hu et al. 2022) and its applications in political science. ConfliBERT is a Large Language Model (LLM) specifically developed to process text related to politics and violence. It was trained on a large domain-specific corpus in English with text related to conflict, political violence, and international politics with global coverage. When fine-tuned, results show that ConfliBERT has superior performance over other LLMs like Gemma 2 (9B) and Llama 3.1 (7B) within its relevant domains. We then discuss multi-lingual extensions of ConfliBERT for Spanish and Arabic source texts and show that ConfliBERT also outperforms alternative models in their native languages. Finally, we discuss limitations of the models and propose further extensions.

Populism and the Price of Abandonment: Analyzing the Exit Dynamics from Bilateral Investment Treaties (with Clint Peinhardt)

This study explores how rising nationalist sentiment influences countries’ decisions to exit international investment agreements, combining economic data with text analysis of political rhetoric.

Expanding the Horizons: Extension of Borzyskowski & Vabulas (2019)

Perplexed by the concept of contagion in IGO withdrawals, this extension explores the network structures and social dynamics within international organizations.

Dissertation

Dissertation Proposal

My dissertation examines internet censorship across various regimes, focusing on the intricacies of content moderation and the political implications thereof.

View Presentation Read Proposal

Large Language Models

ConfliBERT Usage Manual & Finetuning guide (With ConfliBERT Lab)

This document (continually updated!) walks political scientists through the usage of a Large Language Model (LLM) on various tasks such as classification, masking, named entity recognition & question answering. Furthermore, the document outlines the process of finetuning the LLM on the users datasets with the help of a Google Colab script.

View Manual