WhoSaidIt: Human-LLM Collaborative Annotation for Text-Based Multilingual Speaker-Attribute Classification
Lingyu Gao, Will Monroe, David Smith, Meghan Jemison, Jackie Lee
Key claim
Cross-lingual differences significantly affect speaker-attribute annotations.
This paper presents a new framework for re-annotating multilingual speaker attributes using human-LLM collaboration. The key finding is that there are significant cross-lingual differences in how speaker attributes are annotated, highlighting both the potential and limitations of LLMs in this context.
The proposed framework for collaborative re-annotation introduces a novel approach to handling multilingual speaker attributes.
The study includes a comprehensive analysis of annotation divergence and benchmarks against recent LLMs, supporting its claims.
Deep reliability assessment
The methodology supports the use of LLMs to refine annotation guidelines and improve labeling consistency, but it may overclaim the extent to which LLMs can independently resolve ambiguities in subjective tasks without human oversight.
Reproducibility
yes, the dataset is available at https://github.com/duolingo/whosaidit
Discussion questions
- How do we ensure that the LLM's biases do not influence the final annotations?
- What are the implications of using LLMs for annotation in terms of labor costs and quality control?
- What would happen if the LLMs were trained on a dataset with different cultural contexts?
Key figure
Figure 1 illustrates the dataset construction pipeline, highlighting the iterative process of LLM summarization and expert review for refining annotation guidelines.