2024-12-19datacode

Speak-to-Structure: Evaluating LLMs in Open-domain Natural Language-Driven Molecule Generation

Jiatong Li, Junxian Li, Weida Wang, Yunqing Liu, Changmeng Zheng, Yatao Bian, Dongzhan Zhou, Xiao-yong Wei, Qing Li

Key claim

S^2-Bench enables diverse molecular generation from natural language.

The paper presents S^2-Bench, a benchmark for evaluating LLMs in generating diverse molecular candidates from natural language prompts. It includes tasks that test molecule editing, optimization, and customization, demonstrating that Llama3.1-8B can outperform leading models like GPT-4o. This shift in focus enhances the capabilities of LLMs in molecular discovery.

Novelty

8.0/10

The introduction of S^2-Bench represents a meaningful advancement in evaluating LLMs for molecule generation.

Reliability

7.0/10

The evaluation of 31 LLMs and the release of datasets provide a solid methodological foundation.

Deep reliability assessment

The methodology supports the evaluation of LLMs in open-domain natural language-driven molecule generation through a novel benchmark, but it may overclaim the extent to which these models can generate structurally precise molecules given their current limitations in understanding complex chemical constraints.

Reproducibility

Yes, the paper mentions that the codes and datasets are fully accessible through the Github Repository and Huggingface Datasets.

Discussion questions

How might the one-to-many relationship in molecule generation challenge existing assumptions about LLM capabilities?
What are the practical implications of using LLMs for molecule discovery in real-world applications?
What specific conditions or experiments would need to be conducted to potentially falsify the claims made about LLM performance in this paper?

Key figure

Figure 1 illustrates the task composition of S2-Bench, highlighting the three primary tasks: Molecule Editing, Molecule Optimization, and Customized Molecule Generation.

Benchmark results

S2-Benchaverage weighted success rate: 39.33vs Claude-3.5+3.41%SOTA

GitHub1 repo

phenixace/S2-TOMG-BenchOfficial

Read on arXiv →