publications | Caleb Ziems

2024

Culturebank: An online community-driven knowledge base towards culturally aware language technologies

Weiyan Shi, Ryan Li, Yutong Zhang, Caleb Ziems, Raya Horesh, Rogério Abreu Paula, and Diyi Yang

In Findings of the Association for Computational Linguistics: EMNLP 2024, Nov 2024

EMNLP

Abstract Bib arXiv Code Website

To enhance language models’ cultural awareness, we design a generalizable pipeline to construct cultural knowledge bases from different online communities on a massive scale. With the pipeline, we construct CultureBank, a knowledge base built upon users’ self-narratives with 12K cultural descriptors sourced from TikTok and 11K from Reddit. Unlike previous cultural knowledge resources, CultureBank contains diverse views on cultural descriptors to allow flexible interpretation of cultural knowledge, and contextualized cultural scenarios to help grounded evaluation. With CultureBank, we evaluate different LLMs’ cultural awareness, and identify areas for improvement. We also fine-tune a language model on CultureBank: experiments show that it achieves better performances on two downstream cultural tasks in a zero-shot setting. Finally, we offer recommendations based on our findings for future culturally aware language technologies.
@inproceedings{shi2024culturebank, title = {Culturebank: An online community-driven knowledge base towards culturally aware language technologies}, author = {Shi, Weiyan and Li, Ryan and Zhang, Yutong and Ziems, Caleb and Horesh, Raya and de Paula, Rog{\'e}rio Abreu and Yang, Diyi}, booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2024}, month = nov, year = {2024}, address = {Miami, Florida}, publisher = {Association for Computational Linguistics}, }
Measuring and Addressing Indexical Bias in Information Retrieval

Caleb Ziems, William Held, Jane Dwivedi-Yu, and Diyi Yang

In Findings of the Association for Computational Linguistics ACL 2024, Aug 2024

ACL

Abstract Bib arXiv Paper Code Poster Slides

Information Retrieval (IR) systems are designed to deliver relevant content, but traditional systems may not optimize rankings for fairness, neutrality, or the balance of ideas. Consequently, IR can often introduce indexical biases, or biases in the positional order of documents. Although indexical bias can demonstrably affect people’s opinion, voting patterns, and other behaviors, these issues remain understudied as the field lacks reliable metrics and procedures for automatically measuring indexical bias. Towards this end, we introduce the PAIR framework, which supports automatic bias audits for ranked documents or entire IR systems. After introducing DUO, the first general-purpose automatic bias metric, we run an extensive evaluation of 8 IR systems on a new corpus of 32k synthetic and 4.7k natural documents, with 4k queries spanning 1.4k controversial issue topics. A human behavioral study validates our approach, showing that our bias metric can help predict when and how indexical bias will shift a reader’s opinion.
@inproceedings{ziems2024pair, title = {Measuring and Addressing Indexical Bias in Information Retrieval}, author = {Ziems, Caleb and Held, William and Dwivedi-Yu, Jane and Yang, Diyi}, editor = {Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek}, booktitle = {Findings of the Association for Computational Linguistics ACL 2024}, month = aug, year = {2024}, address = {Bangkok, Thailand and virtual meeting}, publisher = {Association for Computational Linguistics}, pages = {12860--12877}, }
Social Intelligence Data Infrastructure: Structuring the Present and Navigating the Future

Minzhi Li, Weiyan Shi, Caleb Ziems, and Diyi Yang

In Findings of the Association for Computational Linguistics: ACL 2024, Aug 2024

ACL

Abstract Bib arXiv Paper Code Poster Website

As Natural Language Processing (NLP) systems become increasingly integrated into human social life, these technologies will need to increasingly rely on social intelligence. Although there are many valuable datasets that benchmark isolated dimensions of social intelligence, there does not yet exist any body of work to join these threads into a cohesive subfield in which researchers can quickly identify research gaps and future directions. Towards this goal, we build a Social AI Data Infrastructure, which consists of a comprehensive social AI taxonomy and a data library of 480 NLP datasets. Our infrastructure allows us to analyze existing dataset efforts, and also evaluate language models’ performance in different social intelligence aspects. Our analyses demonstrate its utility in enabling a thorough understanding of current data landscape and providing a holistic perspective on potential directions for future dataset development. We show there is a need for multifaceted datasets, increased diversity in language and culture, more long-tailed social situations, and more interactive data in future social intelligence data efforts.
@inproceedings{li2024socialAIinfra, title = {Social Intelligence Data Infrastructure: Structuring the Present and Navigating the Future}, author = {Li, Minzhi and Shi, Weiyan and Ziems, Caleb and Yang, Diyi}, booktitle = {Findings of the Association for Computational Linguistics: ACL 2024}, year = {2024}, month = aug, address = {Bangkok, Thailand}, publisher = {Association for Computational Linguistics}, }
Silent Signals, Loud Impact: LLMs for Word-Sense Disambiguation of Coded Dog Whistles

Julia Kruk, Michela Marchini, Rijul Magu, Caleb Ziems, David Muchlinski, and Diyi Yang

In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Aug 2024

ACL

Abstract Bib arXiv Paper Code Poster

A dog whistle is a form of coded communication with a secondary meaning that is often weaponized for racial discrimination. Dog whistles historically began in United States politics, but soon also took root in social media as a means of evading hate speech detection systems and maintaining plausible deniability. In this paper, we present an approach for word-sense disambiguation of dog whistles from standard speech using Large Language Models (LLMs), and leverage this technique to create a dataset of 11,570 high-confidence coded examples of dog whistles used in formal and informal communication. Silent Signals is the largest dataset of disambiguated dog whistle usage, created for applications in hate speech detection, neology, and political science.
@inproceedings{kruk2024silentsignals, title = {Silent Signals, Loud Impact: LLMs for Word-Sense Disambiguation of Coded Dog Whistles}, author = {Kruk, Julia and Marchini, Michela and Magu, Rijul and Ziems, Caleb and Muchlinski, David and Yang, Diyi}, booktitle = {Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, year = {2024}, month = aug, address = {Bangkok, Thailand}, publisher = {Association for Computational Linguistics}, }
Can Large Language Models Transform Computational Social Science?

Caleb Ziems, William Held, Omar Shaikh, Jiaao Chen, Zhehao Zhang, and Diyi Yang

Computational linguistics, Mar 2024

CL

Abstract Bib arXiv Paper Code Poster Slides

Large Language Models (LLMs) are capable of successfully performing many language processing tasks zero-shot (without training data). If zero-shot LLMs can also reliably classify and explain social phenomena like persuasiveness and political ideology, then LLMs could augment the Computational Social Science (CSS) pipeline in important ways. This work provides a road map for using LLMs as CSS tools. Towards this end, we contribute a set of prompting best practices and an extensive evaluation pipeline to measure the zero-shot performance of 13 language models on 25 representative English CSS benchmarks. On taxonomic labeling tasks (classification), LLMs fail to outperform the best fine-tuned models but still achieve fair levels of agreement with humans. On free-form coding tasks (generation), LLMs produce explanations that often exceed the quality of crowdworkers’ gold references. We conclude that the performance of today’s LLMs can augment the CSS research pipeline in two ways: (1) serving as zero-shot data annotators on human annotation teams, and (2) bootstrapping challenging creative generation tasks (e.g., explaining the underlying attributes of a text). In summary, LLMs are posed to meaningfully participate in social science analysis in partnership with humans.
@article{ziems2024css, title = {Can Large Language Models Transform Computational Social Science?}, author = {Ziems, Caleb and Held, William and Shaikh, Omar and Chen, Jiaao and Zhang, Zhehao and Yang, Diyi}, year = {2024}, month = mar, journal = {Computational linguistics}, volume = {50}, number = {1}, publisher = {MIT Press}, }

2023

CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation

Minzhi Li, Taiwei Shi, Caleb Ziems, Min-Yen Kan, Nancy F. Chen, Zhengyuan Liu, and Diyi Yang

In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Dec 2023

EMNLP

Abstract Bib arXiv Paper Code Poster

Annotated data plays a critical role in Natural Language Processing (NLP) in training models and evaluating their performance. Given recent developments in Large Language Models (LLMs), models such as ChatGPT demonstrate zero-shot capability on many text-annotation tasks, comparable with or even exceeding human annotators. Such LLMs can serve as alternatives for manual annotation, due to lower costs and higher scalability. However, limited work has leveraged LLMs as complementary annotators, nor explored how annotation work is best allocated among humans and LLMs to achieve both quality and cost objectives. We propose CoAnnotating, a novel paradigm for Human-LLM co-annotation of unstructured texts at scale. Under this framework, we utilize uncertainty to estimate LLMs’ annotation capability. Our empirical study shows CoAnnotating to be an effective means to allocate work from results on different datasets, with up to 21% performance improvement over random baseline.
@inproceedings{li2023coannotating, title = {CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation}, author = {Li, Minzhi and Shi, Taiwei and Ziems, Caleb and Kan, Min-Yen and Chen, Nancy F. and Liu, Zhengyuan and Yang, Diyi}, booktitle = {Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing}, month = dec, year = {2023}, address = {Singapore}, publisher = {Association for Computational Linguistics}, }
Impressions: Understanding Visual Semiotics and Aesthetic Impact

Julia Kruk, Caleb Ziems, and Diyi Yang

In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Dec 2023

EMNLP

Abstract Bib arXiv Paper Code Poster

Is aesthetic impact different from beauty? Is visual salience a reflection of its capacity for effective communication? We present Impressions, a novel dataset through which to investigate the semiotics of images, and how specific visual features and design choices can elicit specific emotions, thoughts and beliefs. We posit that the impactfulness of an image extends beyond formal definitions of aesthetics, to its success as a communicative act, where style contributes as much to meaning formation as the subject matter. However, prior image captioning datasets are not designed to empower state-of-the-art architectures to model potential human impressions or interpretations of images. To fill this gap, we design an annotation task heavily inspired by image analysis techniques in the Visual Arts to collect 1,440 image-caption pairs and 4,320 unique annotations exploring impact, pragmatic image description, impressions, and aesthetic design choices. We show that existing multimodal image captioning and conditional generation models struggle to simulate plausible human responses to images. However, this dataset significantly improves their ability to model impressions and aesthetic evaluations of images through fine-tuning and few-shot adaptation.
@inproceedings{kruk2023impressions, title = {Impressions: Understanding Visual Semiotics and Aesthetic Impact}, author = {Kruk, Julia and Ziems, Caleb and Yang, Diyi}, booktitle = {Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing}, month = dec, year = {2023}, address = {Singapore}, publisher = {Association for Computational Linguistics}, }
Multi-VALUE: A Framework for Cross-Dialectal English NLP

Caleb Ziems, William Held, Jingfeng Yang, Jwala Dhamala, Rahul Gupta, and Diyi Yang

In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul 2023

ACL

Abstract Bib arXiv Paper Code Poster Slides Website

Dialect differences caused by regional, social, and economic factors cause performance discrepancies for many groups of language technology users. Inclusive and equitable language technology must critically be dialect invariant, meaning that performance remains constant over dialectal shifts. Current systems often fall short of this ideal since they are designed and tested on a single dialect: Standard American English (SAE). We introduce a suite of resources for evaluating and achieving English dialect invariance. The resource is called Multi-VALUE, a controllable rule-based translation system spanning 50 English dialects and 189 unique linguistic features. Multi-VALUE maps SAE to synthetic forms of each dialect. First, we use this system to stress tests question answering, machine translation, and semantic parsing. Stress tests reveal significant performance disparities for leading models on non-standard dialects. Second, we use this system as a data augmentation technique to improve the dialect robustness of existing systems. Finally, we partner with native speakers of Chicano and Indian English to release new gold-standard variants of the popular CoQA task. To execute the transformation code, run model checkpoints, and download both synthetic and gold-standard dialectal benchmark datasets, see http://value-nlp.org
@inproceedings{ziems2023multi, title = {Multi-VALUE: A Framework for Cross-Dialectal English NLP}, author = {Ziems, Caleb and Held, William and Yang, Jingfeng and Dhamala, Jwala and Gupta, Rahul and Yang, Diyi}, booktitle = {Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, year = {2023}, month = jul, address = {Toronto, Canada}, publisher = {Association for Computational Linguistics}, }
NormBank: A Knowledge Bank of Situational Social Norms

Caleb Ziems, Jane Dwivedi-Yu, Yi-Chia Wang, Alon Halevy, and Diyi Yang

In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul 2023

ACL

Abstract Bib arXiv Paper Code Poster Slides

We present NormBank, a knowledge bank of 155k situational norms. This resource is designed to ground flexible normative reasoning for interactive, assistive, and collaborative AI systems. Unlike prior commonsense resources, NormBank grounds each inference within a multivalent sociocultural frame, which includes the setting (e.g., restaurant), the agents’ contingent roles (waiter, customer), their attributes (age, gender), and other physical, social, and cultural constraints (e.g., the temperature or the country of operation). In total, NormBank contains 63k unique constraints from a taxonomy that we introduce and iteratively refine here. Constraints then apply in different combinations to frame social norms. Under these manipulations, norms are non-monotonic - one can cancel an inference by updating its frame even slightly. Still, we find evidence that neural models can help reliably extend the scope and coverage of NormBank. We further demonstrate the utility of this resource with a series of transfer experiments.
@inproceedings{ziems2023norm, title = {NormBank: A Knowledge Bank of Situational Social Norms}, author = {Ziems, Caleb and Dwivedi-Yu, Jane and Wang, Yi-Chia and Halevy, Alon and Yang, Diyi}, booktitle = {Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, year = {2023}, month = jul, address = {Toronto, Canada}, publisher = {Association for Computational Linguistics}, }
TADA: Task-Agnostic Dialect Adapters for English

William Held, Caleb Ziems, and Diyi Yang

In Findings of the Association for Computational Linguistics: ACL 2023, Jul 2023

ACL

Abstract Bib arXiv Paper Code Poster Slides

Large Language Models, the dominant starting point for Natural Language Processing (NLP) applications, fail at a higher rate for speakers of English dialects other than Standard American English (SAE). Prior work addresses this using task-specific data or synthetic data augmentation, both of which require intervention for each dialect and task pair. This poses a scalability issue that prevents the broad adoption of robust dialectal English NLP. We introduce a simple yet effective method for task-agnostic dialect adaptation by aligning non-SAE dialects using adapters and composing them with task-specific adapters from SAE. Task-Agnostic Dialect Adapters (TADA) improve dialectal robustness on 4 dialectal variants of the GLUE benchmark without task-specific supervision.
@inproceedings{held2023tada, title = {TADA: Task-Agnostic Dialect Adapters for English}, author = {Held, William and Ziems, Caleb and Yang, Diyi}, booktitle = {Findings of the Association for Computational Linguistics: ACL 2023}, year = {2023}, month = jul, address = {Toronto, Canada}, publisher = {Association for Computational Linguistics}, }
Modeling Cross-Cultural Pragmatic Inference with Codenames Duet

Omar Shaikh, Caleb Ziems, William Held, Aryan J. Pariani, Fred Morstatter, and Diyi Yang

In Findings of the Association for Computational Linguistics: ACL 2023, Jul 2023

ACL

Abstract Bib arXiv Paper Code Poster Slides

Pragmatic reference enables efficient interpersonal communication. Prior work uses simple reference games to test models of pragmatic reasoning, often with unidentified speakers and listeners. In practice, however, speakers’ sociocultural background shapes their pragmatic assumptions. For example, readers of this paper assume NLP refers to ’Natural Language Processing,’ and not ’Neuro-linguistic Programming.’ This work introduces the Cultural Codes dataset, which operationalizes sociocultural pragmatic inference in a simple word reference game. Cultural Codes is based on the multi-turn collaborative two-player game, Codenames Duet. Our dataset consists of 794 games with 7,703 turns, distributed across 153 unique players. Alongside gameplay, we collect information about players’ personalities, values, and demographics. Utilizing theories of communication and pragmatics, we predict each player’s actions via joint modeling of their sociocultural priors and the game context. Our experiments show that accounting for background characteristics significantly improves model performance for tasks related to both clue giving and guessing, indicating that sociocultural priors play a vital role in gameplay decisions.
@inproceedings{shaikh2023modeling, title = {Modeling Cross-Cultural Pragmatic Inference with Codenames Duet}, author = {Shaikh, Omar and Ziems, Caleb and Held, William and Pariani, Aryan J. and Morstatter, Fred and Yang, Diyi}, booktitle = {Findings of the Association for Computational Linguistics: ACL 2023}, year = {2023}, month = jul, address = {Toronto, Canada}, publisher = {Association for Computational Linguistics}, }

2022

Inducing Positive Perspectives with Text Reframing

Caleb Ziems, Minzhi Li, Anthony Zhang, and Diyi Yang

In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), May 2022

Outstanding Paper

ACL

Abstract Bib arXiv Paper Code Poster Slides

Sentiment transfer is one popular example of a text style transfer task, where the goal is to reverse the sentiment polarity of a text. With a sentiment reversal comes also a reversal in meaning. We introduce a different but related task called positive reframing in which we neutralize a negative point of view and generate a more positive perspective for the author without contradicting the original meaning. Our insistence on meaning preservation makes positive reframing a challenging and semantically rich task. To facilitate rapid progress, we introduce a large-scale benchmark, Positive Psychology Frames, with 8,349 sentence pairs and 12,755 structured annotations to explain positive reframing in terms of six theoretically-motivated reframing strategies. Then we evaluate a set of state-of-the-art text style transfer models, and conclude by discussing key challenges and directions for future work.
@inproceedings{ziems-etal-2022-positive-frames, title = {Inducing Positive Perspectives with Text Reframing}, author = {Ziems, Caleb and Li, Minzhi and Zhang, Anthony and Yang, Diyi}, booktitle = {Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, month = may, year = {2022}, address = {Dublin, Ireland}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2022.acl-long.257}, pages = {3682--3700}, note = { Outstanding Paper} }
The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems

Caleb Ziems, Jane Yu, Yi-Chia Wang, Alon Halevy, and Diyi Yang

In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), May 2022

ACL

Abstract Bib arXiv Paper Code Poster Slides

Conversational agents have come increasingly closer to human competence in open-domain dialogue settings; however, such models can reflect insensitive, hurtful, or entirely incoherent viewpoints that erode a user’s trust in the moral integrity of the system. Moral deviations are difficult to mitigate because moral judgments are not universal, and there may be multiple competing judgments that apply to a situation simultaneously. In this work, we introduce a new resource, not to authoritatively resolve moral ambiguities, but instead to facilitate systematic understanding of the intuitions, values and moral judgments reflected in the utterances of dialogue systems. The Moral Integrity Corpus, MIC, is such a resource, which captures the moral assumptions of 38k prompt-reply pairs, using 99k distinct Rules of Thumb (RoTs). Each RoT reflects a particular moral conviction that can explain why a chatbot’s reply may appear acceptable or problematic. We further organize RoTs with a set of 9 moral and social attributes and benchmark performance for attribute classification. Most importantly, we show that current neural language models can automatically generate new RoTs that reasonably describe previously unseen interactions, but they still struggle with certain scenarios. Our findings suggest that MIC will be a useful resource for understanding and language models’ implicit moral assumptions and flexibly benchmarking the integrity of conversational agents.
@inproceedings{ziems-etal-2022-mic, title = {The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems}, author = {Ziems, Caleb and Yu, Jane and Wang, Yi-Chia and Halevy, Alon and Yang, Diyi}, booktitle = {Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, month = may, year = {2022}, address = {Dublin, Ireland}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2022.acl-long.261}, pages = {3755--3773}, }
VALUE: Understanding Dialect Disparity in NLU

Caleb Ziems, Jiaao Chen, Camille Harris, Jessica Anderson, and Diyi Yang

In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), May 2022

ACL

Abstract Bib arXiv Paper Code Poster Slides

English Natural Language Understanding (NLU) systems have achieved great performances and even outperformed humans on benchmarks like GLUE and SuperGLUE. However, these benchmarks contain only textbook Standard American English (SAE). Other dialects have been largely overlooked in the NLP community. This leads to biased and inequitable NLU systems that serve only a sub-population of speakers. To understand disparities in current models and to facilitate more dialect-competent NLU systems, we introduce the VernAcular Language Understanding Evaluation (VALUE) benchmark, a challenging variant of GLUE that we created with a set of lexical and morphosyntactic transformation rules. In this initial release (V.1), we construct rules for 11 features of African American Vernacular English (AAVE), and we recruit fluent AAVE speakers to validate each feature transformation via linguistic acceptability judgments in a participatory design manner. Experiments show that these new dialectal features can lead to a drop in model performance.
@inproceedings{ziems-etal-2022-value, title = {{VALUE}: {U}nderstanding Dialect Disparity in {NLU}}, author = {Ziems, Caleb and Chen, Jiaao and Harris, Camille and Anderson, Jessica and Yang, Diyi}, booktitle = {Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, month = may, year = {2022}, address = {Dublin, Ireland}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2022.acl-long.258}, pages = {3701--3720}, }

2021

Latent Hatred: A Benchmark for Understanding Implicit Hate Speech

Mai ElSherief, Caleb Ziems, David Muchlinski, Vaishnavi Anupindi, Jordyn Seybolt, Munmun De Choudhury, and Diyi Yang

In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Nov 2021

EMNLP

Abstract Bib arXiv Paper Code Poster Slides

Hate speech has grown significantly on social media, causing serious consequences for victims of all demographics. Despite much attention being paid to characterize and detect discriminatory speech, most work has focused on explicit or overt hate speech, failing to address a more pervasive form based on coded or indirect language. To fill this gap, this work introduces a theoretically-justified taxonomy of implicit hate speech and a benchmark corpus with fine-grained labels for each message and its implication. We present systematic analyses of our dataset using contemporary baselines to detect and explain implicit hate speech, and we discuss key features that challenge existing models. This dataset will continue to serve as a useful benchmark for understanding this multifaceted issue.
@inproceedings{elsherief-2021-latent-hatred, title = {Latent Hatred: A Benchmark for Understanding Implicit Hate Speech}, author = {ElSherief, Mai and Ziems, Caleb and Muchlinski, David and Anupindi, Vaishnavi and Seybolt, Jordyn and De Choudhury, Munmun and Yang, Diyi}, booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing}, month = nov, year = {2021}, address = {Online and Punta Cana, Dominican Republic}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2021.emnlp-main.29}, pages = {345--363}, }
To Protect and To Serve? Analyzing Entity-Centric Framing of Police Violence

Caleb Ziems, and Diyi Yang

In Findings of the Association for Computational Linguistics: EMNLP 2021, Nov 2021

EMNLP

Abstract Bib arXiv Paper Code Poster Slides

Framing has significant but subtle effects on public opinion and policy. We propose an NLP framework to measure entity-centric frames. We use it to understand media coverage on police violence in the United States in a new Police Violence Frames Corpus of 82k news articles spanning 7k police killings. Our work uncovers more than a dozen framing devices and reveals significant differences in the way liberal and conservative news sources frame both the issue of police violence and the entities involved. Conservative sources emphasize when the victim is armed or attacking an officer and are more likely to mention the victim’s criminal record. Liberal sources focus more on the underlying systemic injustice, highlighting the victim’s race and that they were unarmed. We discover temporary spikes in these injustice frames near high-profile shooting events, and finally, we show protest volume correlates with and precedes media framing decisions.
@inproceedings{ziems-yang-2021-protect-serve, title = {To Protect and To Serve? Analyzing Entity-Centric Framing of Police Violence}, author = {Ziems, Caleb and Yang, Diyi}, booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2021}, month = nov, year = {2021}, address = {Punta Cana, Dominican Republic}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2021.findings-emnlp.82}, pages = {957--976}, }
Racism is a Virus: Anti-Asian Hate and Counterhate in Social Media during the COVID-19 Crisis

Bing He, Caleb Ziems, Sandeep Soni, Naren Ramakrishnan, Diyi Yang, and Srijan Kumar

In 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Nov 2021

ASONAM

Abstract Bib arXiv Paper Code Website

The spread of COVID-19 has sparked racism and hate on social media targeted towards Asian communities. However, little is known about how racial hate spreads during a pandemic and the role of counterspeech in mitigating this spread. In this work, we study the evolution and spread of anti-Asian hate speech through the lens of Twitter. We create COVID-HATE, the largest dataset of anti-Asian hate and counterspeech spanning 14 months, containing over 206 million tweets, and a social network with over 127 million nodes. By creating a novel hand-labeled dataset of 3,355 tweets, we train a text classifier to identify hate and counterspeech tweets that achieves an average macro-F1 score of 0.832. Using this dataset, we conduct longitudinal analysis of tweets and users. Analysis of the social network reveals that hateful and counterspeech users interact and engage extensively with one another, instead of living in isolated polarized communities. We find that nodes were highly likely to become hateful after being exposed to hateful content. Notably, counterspeech messages may discourage users from turning hateful, potentially suggesting a solution to curb hate on web and social media platforms.
@inproceedings{he2021yearlong, title = {Racism is a Virus: Anti-Asian Hate and Counterhate in Social Media during the COVID-19 Crisis}, author = {He, Bing and Ziems, Caleb and Soni, Sandeep and Ramakrishnan, Naren and Yang, Diyi and Kumar, Srijan}, booktitle = {2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)}, year = {2021}, }

2020

Aggressive, Repetitive, Intentional, Visible, and Imbalanced: Refining Representations for Cyberbullying Classification

Caleb Ziems, Ymir Vigfusson, and Fred Morstatter

In Proceedings of the International AAAI Conference on Web and Social Media, Nov 2020

Best Paper Honorable Mention

ICWSM

Abstract Bib arXiv Paper Code Slides

Cyberbullying is a pervasive problem in online communities. To identify cyberbullying cases in large-scale social networks, content moderators depend on machine learning classifiers for automatic cyberbullying detection. However, existing models remain unfit for real-world applications, largely due to a shortage of publicly available training data and a lack of standard criteria for assigning ground truth labels. In this study, we address the need for reliable data using an original annotation framework. Inspired by social sciences research into bullying behavior, we characterize the nuanced problem of cyberbullying using five explicit factors to represent its social and linguistic aspects. We model this behavior using social network and language-based features, which improve classifier performance. These results demonstrate the importance of representing and modeling cyberbullying as a social phenomenon.
@inproceedings{ziems2020aggressive, title = {Aggressive, Repetitive, Intentional, Visible, and Imbalanced: Refining Representations for Cyberbullying Classification}, author = {Ziems, Caleb and Vigfusson, Ymir and Morstatter, Fred}, booktitle = {Proceedings of the International AAAI Conference on Web and Social Media}, volume = {14}, pages = {808--819}, year = {2020}, note = { Best Paper Honorable Mention}, }