2026-05-25infradatacode

Beyond Summaries: Structure-Aware Labeling of Code Changes with Large Language Models

Bar Weiss, Antonio Abu-Nassar, Adi Sosnovich, Karen Yorav

Key claim

LLM-based labeling significantly improves code review efficiency.

This paper presents a new approach to improve code review efficiency by using large language models to label code changes in patches. The proposed method achieves high recall and precision, suggesting it can effectively enhance traditional static analysis workflows.

Novelty

7.5/10

The paper introduces a novel two-stage pipeline for taxonomy-based labeling of code changes, extending LLM applications in code review.

Reliability

8.0/10

The evaluation of multiple LLMs on a curated benchmark supports the claims with solid experimental validation.

Deep reliability assessment

The methodology supports the effective labeling of code changes using LLMs, but it may overclaim the generalizability of results across all programming languages and change types due to the limited dataset used for evaluation.

Reproducibility

Yes, the dataset is manually curated and includes both natural and synthetic patches, but the code repository is not mentioned.

Discussion questions

What assumptions are made about the language-agnostic capabilities of the LLMs in diverse programming environments?
How can the findings be practically applied to improve existing code review processes in software development teams?
What specific conditions or changes in the dataset would lead to a significant drop in the reported performance metrics?

Key figure

Figure 1 illustrates the labeling process of diff hunks, showing how labels are assigned based on the changes detected in the code.

Benchmark results

Manually curated benchmark of natural and synthetic patchesRecall: 0.84vs Not specifiedN/ASOTA

Manually curated benchmark of natural and synthetic patchesPrecision: 0.81vs Not specifiedN/ASOTA

Codelink

figshare.com/s/a254f611ba26b4da18a2Official

Read on arXiv →