Beyond Summaries: Structure-Aware Labeling of Code Changes with Large Language Models
Bar Weiss, Antonio Abu-Nassar, Adi Sosnovich, Karen Yorav
Key claim
LLM-based labeling significantly improves code review efficiency.
This paper presents a new approach to improve code review efficiency by using large language models to label code changes in patches. The proposed method achieves high recall and precision, suggesting it can effectively enhance traditional static analysis workflows.
The paper introduces a novel two-stage pipeline for taxonomy-based labeling of code changes, extending LLM applications in code review.
The evaluation of multiple LLMs on a curated benchmark supports the claims with solid experimental validation.
Deep reliability assessment
The methodology supports the effective labeling of code changes using LLMs, but it may overclaim the generalizability of results across all programming languages and change types due to the limited dataset used for evaluation.
Reproducibility
Yes, the dataset is manually curated and includes both natural and synthetic patches, but the code repository is not mentioned.
Discussion questions
- What assumptions are made about the language-agnostic capabilities of the LLMs in diverse programming environments?
- How can the findings be practically applied to improve existing code review processes in software development teams?
- What specific conditions or changes in the dataset would lead to a significant drop in the reported performance metrics?
Key figure
Figure 1 illustrates the labeling process of diff hunks, showing how labels are assigned based on the changes detected in the code.