Frontier Paper Feed

What's worth reading today.

AI research papers scored by an LLM eval pipeline on novelty and reliability. Upvote to surface what the community should discuss.

PASS ✓
2026.05.26agents

GENESIS: Harnessing AI Agents for Autonomous 6G RAN Synthesis, Research, and Testing

GENESIS is an AI framework designed to streamline cellular R&D by converting intents into validated solutions. It effectively reduces the time required for R&D processes while addressing the unique challenges posed by Radio Access Networks. The key result is that it enables faster and more reliable development cycles in a field where traditional methods are time-consuming.

Novelty
8.0
Reliability
7.0
arxiv/2605.27360
PASS ✓
2026.05.26visioncode

EdgeFlow: Edge-Map Augmented VLM-Based Flowchart Processing for Industrial Requirements Engineering

EdgeFlow improves the conversion of flowcharts to machine-readable models by using a Canny edge map as a structural prior. It achieves notable increases in node-level and edge-level F1 scores, demonstrating its effectiveness in industrial requirements engineering. This method does not require annotated training data, making it practical for real-world applications.

Novelty
7.5
Reliability
8.0
arxiv/2605.27332
PASS ✓
2026.05.26agents

BASIS: Batchwise Advantage Estimation from Single-Rollout Information Sharing for LLM Reasoning

BASIS is a new algorithm that enhances the efficiency of value function estimation in reinforcement learning. It achieves a 69% reduction in MSE compared to a strong baseline while using only one rollout per prompt, leading to better policy optimization with less training time.

Novelty
8.0
Reliability
8.0
arxiv/2605.27293
PASS ✓
2026.05.25infracode

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

MobileGym is a new environment designed for mobile applications that allows for high interaction fidelity and scalable reinforcement learning. It provides structured evaluation and rewards, leading to a notable performance improvement in real-device execution. The key result shows a +12.8 percentage point gain on a test set, indicating its effectiveness.

Novelty
8.0
Reliability
7.5
arxiv/2605.26114
PASS ✓
2026.05.25agentscode

From Model Scaling to System Scaling: Scaling the Harness in Agentic AI

This paper highlights the importance of designing modular and verifiable architectures around foundation models for agentic AI. It identifies key bottlenecks in context governance, trustworthy memory, and skill routing, proposing a new evaluation framework that focuses on the quality of agent behavior over simple task success. The key result is the introduction of CheetahClaws, a reference harness for evaluating these architectures.

Novelty
8.0
Reliability
7.0
arxiv/2605.26112
PASS ✓
2026.05.25infracode

Prism: A Plug-in Reproducible Infrastructure for Scalable Multimodal Continual Instruction Tuning

The paper presents Prism, a new codebase designed to facilitate scalable Multimodal Continual Instruction Tuning (MCIT) research. By allowing independent plugin integration, it reduces implementation overhead and enhances code reuse. This approach aims to accelerate the development of new MCIT strategies.

Novelty
8.0
Reliability
7.5
arxiv/2605.26110
PASS ✓
2026.05.25infracode

Beyond Summaries: Structure-Aware Labeling of Code Changes with Large Language Models

This paper presents a new approach to improve code review efficiency by using large language models to label code changes in patches. The proposed method achieves high recall and precision, suggesting it can effectively enhance traditional static analysis workflows.

Novelty
7.5
Reliability
8.0
arxiv/2605.26100
PASS ✓
2026.05.25datacode

Forgetting in Language Models: Capacity, Optimization, and Self-Generated Replay

This paper addresses the issue of forgetting in language models when trained on new tasks. It shows that self-generated samples can effectively serve as replay data, significantly reducing forgetting. The key result is that this method allows for high-learning-rate finetuning without the typical tradeoff of forgetting.

Novelty
7.5
Reliability
8.0
arxiv/2605.26097
PASS ✓
2026.05.25data

Active Query Synthesis for Preference Learning

This paper presents a novel approach to active learning that improves the efficiency of user preference learning by addressing feedback reliability. The key result is the development of the Info-Synth framework, which generates optimal queries to enhance decision-making systems. This method shows versatility across various applications, including preference learning and robotic control.

Novelty
8.0
Reliability
7.5
arxiv/2605.26072
PASS ✓
2026.05.25datacode

WhoSaidIt: Human-LLM Collaborative Annotation for Text-Based Multilingual Speaker-Attribute Classification

This paper presents a new framework for re-annotating multilingual speaker attributes using human-LLM collaboration. The key finding is that there are significant cross-lingual differences in how speaker attributes are annotated, highlighting both the potential and limitations of LLMs in this context.

Novelty
7.5
Reliability
8.0
arxiv/2605.26070
PASS ✓
2026.05.25agents

Retrying vs Resampling in AI Control

This paper explores the concepts of retrying and resampling in AI coding tools, highlighting how retrying can reduce suspicion scores but may also allow for sneakier attacks. A key finding is that auditing based on maximum suspicion scores during resampling significantly improves safety without sacrificing usefulness.

Novelty
7.5
Reliability
8.0
arxiv/2605.26047
PASS ✓
2026.05.25alignment

Causal methods for LLM development and evaluation

This paper argues for the integration of causal methods in the development and evaluation of large language models. It highlights how these methods can address confounding factors and improve the reliability of LLMs. The key result is that causal methods can enhance the understanding of interventions in LLM training and evaluation.

Novelty
8.0
Reliability
7.0
arxiv/2605.25998
PASS ✓
2026.05.22agentscode

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

SkillOpt is a novel optimizer for agent skills that improves performance by applying a controlled text-space optimization approach. It significantly enhances the accuracy of various models in different execution environments, demonstrating its effectiveness across multiple benchmarks.

Novelty
8.0
Reliability
8.0
arxiv/2605.23904
PASS ✓
2026.05.22scaling

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

The paper presents the Shannon Scaling Law, which models LLM training as information transmission, capturing the effects of noise on performance. A key result is that failing to maintain a sufficient signal-to-noise ratio leads to performance degradation, which is effectively predicted by this new framework.

Novelty
8.0
Reliability
8.0
arxiv/2605.23901
PASS ✓
2026.05.22agentscode

From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills

This paper investigates the lifecycle of skills in language agents, focusing on their extraction and consumption. A key finding is that model-generated skills generally improve performance but can lead to negative transfer, highlighting the complexity of skill utility across different models. The authors propose a meta-skill to enhance skill extraction and reduce negative transfer.

Novelty
8.0
Reliability
7.0
arxiv/2605.23899
PASS ✓
2026.05.22reasoning

SPACENUM: Revisiting Spatial Numerical Understanding in VLMs

This paper investigates how well Vision-Language Models understand numerical outputs in spatial contexts. The key finding is that these models often fail to ground numerical values in spatial meaning, performing close to random guessing. Improvements through tuning were noted, but explicit reasoning provided only marginal benefits.

Novelty
7.5
Reliability
6.5
arxiv/2605.23898
PASS ✓
2026.05.22reasoningcode

ETCHR: Editing To Clarify and Harness Reasoning

The paper presents ETCHR, a novel image editing model designed to enhance visual reasoning in multimodal large language models. It improves reasoning accuracy significantly across various tasks, achieving notable performance gains with different models.

Novelty
8.0
Reliability
7.0
arxiv/2605.23897
PASS ✓
2026.05.22scaling

Complete-muE: Optimal Hyperparameter Transfer and Scaling for MoE Models

Complete-muE is a framework that enables efficient hyperparameter transfer from dense models to Mixture-of-Experts (MoE) models. It allows for stable hyperparameter optimization across different model architectures, significantly speeding up convergence without extensive hyperparameter searches. The key result is that hyperparameters tuned on a single dense model can be effectively transferred to all MoE configurations.

Novelty
8.0
Reliability
7.5
arxiv/2605.23893
PASS ✓
2026.05.22visioncode

Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers

This paper presents a two-stage token selection framework that enhances the efficiency of visual geometry transformers for 3D reconstruction. By reducing the number of tokens each query interacts with, the method accelerates processing by over 85% while maintaining or improving performance. This advancement could significantly impact future applications in the field.

Novelty
8.0
Reliability
7.0
arxiv/2605.23892
PASS ✓
2026.05.22data

CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolving Data Marketplaces

CHRONOS is a new architecture designed to improve recall and privacy in temporal knowledge-graph data marketplaces. It achieves a high recall rate of 0.937 while maintaining competitive query performance and privacy guarantees. This makes it a promising solution for managing evolving data and privacy constraints.

Novelty
8.0
Reliability
7.5
arxiv/2605.23887
PASS ✓
2026.05.22data

Multilingual Knowledge Transfer under Data Constraints via Lexical Interventions

The LINK method enhances cross-lingual knowledge transfer by using lexical substitutions in high-resource training data. This approach requires only a bilingual vocabulary and leads to significant improvements in downstream tasks, achieving up to a 2x speedup in training time for equivalent performance.

Novelty
7.5
Reliability
8.0
arxiv/2605.23885
PASS ✓
2026.05.22vision

PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs

This paper introduces Procedurally Generated Tasks (PGT) to enhance fine-grained visual understanding in Multimodal Large Language Models. The key result shows that instruction tuning with PGT data improves performance by up to +20% on the What'sUp benchmark, indicating that better supervision can address spatial reasoning deficits.

Novelty
8.0
Reliability
7.5
arxiv/2605.23883
PASS ✓
2026.05.22data

On the Stability of Spherical Hellinger-Kantorovich Flows and Their Implications for Differential Privacy

This paper introduces a perturbation theory for spherical Hellinger-Kantorovich gradient flows, allowing for the comparison of flows from different potentials. A key result is the establishment of uniform bounds for log-likelihood ratios and divergences, which can be applied to enhance sampling methods in differential privacy.

Novelty
7.5
Reliability
8.0
arxiv/2605.23879
PASS ✓
2026.05.22scaling

Training-Free Looped Transformers

This paper presents a training-free method for enhancing transformer models by applying a looping strategy at inference time. The key result shows significant performance improvements on various benchmarks, including a +2.64 percentage point increase on MMLU-Pro for Qwen3-4B-Instruct.

Novelty
8.0
Reliability
7.0
arxiv/2605.23872
PASS ✓
2026.05.22scaling

Move on Muon : A Hamiltonian probability gradient flow perspective of Muon optimizer

This paper presents a new gradient flow for optimizing matrix-valued parameters using a regularized version of the Muon optimizer. The key result is the establishment of a damped Hamiltonian dynamics that ensures energy dissipation and convergence rates under certain conditions, which could enhance training in neural networks.

Novelty
8.0
Reliability
7.0
arxiv/2605.23871
PASS ✓
2026.05.22reasoning

Human Decision-Making with Persuasive and Narrative LLM Explanations

This study investigates how LLM-generated narrative explanations affect human decision-making. The key finding is that the persuasiveness of these narratives does not significantly improve decision accuracy compared to AI predictions alone, and may even slow down response times.

Novelty
6.0
Reliability
7.0
arxiv/2605.23867
PASS ✓
2026.05.22reasoning

Leveraging Foundation Models for Causal Generative Modeling

FM-CGM is a new framework that enables visual causal reasoning by integrating pretrained foundation models. It allows for zero-shot causal discovery and counterfactual generation, making it valuable for applications requiring reliable causal inference. A key result is its ability to identify plausible causal structures effectively.

Novelty
8.0
Reliability
7.0
arxiv/2605.23861
PASS ✓
2026.05.22scalingcode

Strong Teacher Not Needed? On Distillation in LLM Pretraining

This study reveals that even weaker teachers can enhance larger student models when using a proper mix of losses. It also shows that stronger teachers do not always yield better results, as excessive parameters or training can diminish distillation benefits. Importantly, distillation is found to improve generalization more effectively than in-domain fitting.

Novelty
7.5
Reliability
7.0
arxiv/2605.23857
PASS ✓
2026.05.22data

Entrywise Error Bounds for Spectral Ranking with Semi-Random Adversaries

This work explores how the performance of spectral algorithms for BTL estimation can be affected by adversarial sampling. The key finding is that by reweighting observed edges, the performance can be improved to match that of uniformly sampled graphs. This insight is crucial for practitioners dealing with biased data in ranking tasks.

Novelty
7.5
Reliability
7.0
arxiv/2605.23854
PASS ✓
2026.05.22visioncode

Decomposing Queries into Tool Calls for Long-Video Keyframe Retrieval

ToolMerge is a new keyframe retrieval method that leverages LLMs to improve the selection process for long-video question answering. It effectively decomposes queries into tool calls and merges their results, showing a notable 5% improvement in caption retrieval over existing methods. This approach enhances the ability to provide verifiable visual evidence for various types of queries.

Novelty
8.0
Reliability
7.0
arxiv/2605.23826
PASS ✓
2026.05.22alignmentcode

It's the humans, not the data: Geopolitical bias in LLMs originates in post-training, amplified by the language of the prompt

The research indicates that geopolitical biases in language models are primarily influenced by post-training rather than pre-training. Notably, the model from Alibaba showed a significant shift in bias towards China after post-training, emphasizing the importance of oversight in model alignment processes.

Novelty
8.0
Reliability
7.5
arxiv/2605.23825
PASS ✓
2026.05.22reasoning

Hierarchical Concept Geometry in Language Models Emerges from Word Co-occurrence

This paper presents a theory that explains how the relationship between general and specific concepts is geometrically represented in language models. The key finding is that the structure of word embeddings reflects a hierarchical organization that mirrors taxonomic relationships, which can be observed in both word2vec and Gemma 2B embeddings.

Novelty
8.0
Reliability
7.5
arxiv/2605.23821
PASS ✓
2026.05.22vision

Not Too Generative, Not Too Discriminative: The Human Alignment Sweet Spot

This study investigates how human-like visual representations can be better understood through a balance of discriminative and generative learning. The key finding is that human alignment is maximized at intermediate points of this continuum, suggesting that a hybrid approach yields better results in vision tasks.

Novelty
8.0
Reliability
7.0
arxiv/2605.23819
PASS ✓
2026.05.22agents

Advanced AI Service Provisioning in O-RAN through LLM Engine Integration

This paper introduces a Dual-Brain architecture that leverages LLMs for orchestrating data collection and deployment in O-RAN systems, while an automated ML engine trains classifiers on demand. The key result is the ability to streamline the development of AI applications for real-time RAN control, enhancing efficiency.

Novelty
8.0
Reliability
7.0
arxiv/2605.23809
PASS ✓
2026.05.22visioncode

Debiased Negative Mining Improves Out-of-distribution Detection with Pre-trained Vision-Language Models

This paper presents a new approach to out-of-distribution detection using pre-trained vision-language models. The key result shows that their method for debiasing negative label mining significantly improves OOD detection performance across various setups.

Novelty
8.0
Reliability
7.0
arxiv/2605.23797
PASS ✓
2026.05.22multimodalcode

Beyond Binary Edits Robust Multimodal Knowledge Editing with Adversarial Subspace Alignment

This paper addresses the challenge of updating knowledge in multimodal large language models without losing existing capabilities. The authors propose new techniques to enhance the generalization of knowledge edits, demonstrating that their methods can effectively maintain consistent predictions across semantically similar inputs. A key result is the introduction of adversarial variants that improve robustness in knowledge editing.

Novelty
8.0
Reliability
7.0
arxiv/2605.23780
PASS ✓
2026.05.21scaling

Is Capability a Liability? More Capable Language Models Make Worse Forecasts When It Matters Most

The paper reveals that larger language models perform worse in forecasting tasks with superlinear growth and tail risks, particularly in the upper tail of distributions. This inverse scaling effect suggests that more capable models may misestimate extreme outcomes while maintaining lower tail accuracy. The authors recommend using continuous accuracy measures for better evaluation of LLM forecasting.

Novelty
8.0
Reliability
7.0
arxiv/2605.22672
PASS ✓
2026.05.19data

Semi-Parametric Bayesian Additive Regression Trees for Risk Prediction with High-Dimensional Epigenetic Signatures and Low-Dimensional Covariates

The spBART model effectively combines interpretable low-dimensional covariates with complex high-dimensional predictors. It successfully identifies important genomic loci and achieves a high out-of-sample discrimination rate (AUC = 0.96) in multiple myeloma studies.

Novelty
8.0
Reliability
7.0
arxiv/2605.20143
PASS ✓
2026.05.18datacode

Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German

This study benchmarks five commercial ASR systems on code-switching between various languages. The key finding is that ElevenLabs Scribe v2 outperforms others with the lowest WER and highest BERTScore, highlighting significant quality differences in ASR performance.

Novelty
7.5
Reliability
8.0
arxiv/2605.19069
PASS ✓
2026.05.18agents

Entropy-Gradient Inversion: Moving Toward Internal Mechanism of Large Reasoning Models

This paper presents a new approach to improve reasoning in Large Reasoning Models by utilizing a correlation between token entropy and logit gradients. The key result shows that their proposed method, CorR-PO, consistently outperforms existing techniques, indicating that stronger entropy inversions lead to better reasoning performance.

Novelty
8.0
Reliability
7.5
arxiv/2605.17770
PASS ✓
2026.05.13reasoningcode

Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders

This paper presents a framework for interpreting EEG foundation models by extracting sparse feature dictionaries and grounding them in clinical taxonomies. A key result is the identification of operational regimes that reveal critical representational failures, impacting clinical trust in model predictions.

Novelty
8.0
Reliability
7.0
arxiv/2605.13930
PASS ✓
2026.04.28agents

Ceci n'est pas une explication: Evaluating Explanation Failures as Explainability Pitfalls in Language Learning Systems

The paper highlights how AI language learning tools can provide misleading feedback that reinforces misconceptions. It introduces L2-Bench, a benchmark for assessing AI feedback quality across six critical dimensions. The key result is the identification of 'explainability pitfalls' that can harm learning outcomes.

Novelty
8.0
Reliability
7.0
arxiv/2604.26145
PASS ✓
2026.04.27agentscode

SUDP: Secret-Use Delegation Protocol for Agentic Systems

The paper addresses the security risks associated with agentic systems using user secrets by formalizing the Agent Secret Use (ASU) problem. It proposes the Secret-Use Delegation Protocol (SUDP), which allows secure operations without granting reusable authority to untrusted requesters. This approach ensures that user-authorized actions are performed safely and effectively.

Novelty
8.0
Reliability
8.0
arxiv/2604.24920
PASS ✓
2026.04.18data

RoIt-XMASA: Multi-Domain Multilingual Sentiment Analysis Dataset for Romanian and Italian

RoIt-XMASA is a new multilingual dataset for sentiment analysis that includes 36,000 labeled reviews in Italian and Romanian. The proposed adversarial training framework improves sentiment discrimination while maintaining language and domain invariance, achieving a notable F1-score of 66.23% with XLM-R.

Novelty
7.5
Reliability
7.0
arxiv/2604.17134
PASS ✓
2026.03.24agents

Safe Reinforcement Learning with Preference-based Constraint Inference

This study presents a new approach called Preference-based Constrained Reinforcement Learning (PbCRL) that effectively infers safety constraints from human preferences. A key result is that PbCRL achieves better alignment with true safety requirements while outperforming existing methods in both safety and reward metrics.

Novelty
8.0
Reliability
7.5
arxiv/2603.23565
PASS ✓
2026.03.08vision

Compression as Adaptation: Implicit Visual Representation with Diffusion Foundation Models

This paper presents a novel visual representation framework that encodes signals as functions, allowing for efficient video compression. The key result is the ability to hash an 81-frame video into a compact vector while enabling control over compression performance.

Novelty
8.0
Reliability
7.0
arxiv/2603.07615
PASS ✓
2026.03.07alignmentcode

Entropy-Aware On-Policy Distillation of Language Models

The paper presents Entropy-Aware On-Policy Distillation, which improves knowledge transfer between language models by balancing precision and diversity. The key result shows significant accuracy gains across various benchmarks, indicating that accounting for teacher uncertainty enhances student-teacher alignment.

Novelty
8.0
Reliability
7.5
arxiv/2603.07079
PASS ✓
2026.02.17· DI-ENSdata

Certified Per-Instance Unlearning Using Individual Sensitivity Bounds

This work presents a new method for certified machine unlearning that uses adaptive noise calibration based on individual data point contributions. The key result is that this approach allows for certified unlearning with significantly less noise injection compared to traditional methods, improving practical applicability. The findings are supported by both theoretical analysis and experimental results.

Novelty
8.0
Reliability
7.0
arxiv/2602.15602
PASS ✓
2026.02.13data

Linear Regression with Unknown Truncation Beyond Gaussian Features

This paper presents a novel algorithm for truncated linear regression that operates efficiently even when the survival set is unknown. It achieves a polynomial runtime with respect to the number of dimensions and desired accuracy, making it more practical for real-world applications. The approach also contributes to positive-only PAC learning, which could be beneficial for future research.

Novelty
8.0
Reliability
7.0
arxiv/2602.12534
PASS ✓
2026.01.07agentscode

R$^3$L: Reflect-then-Retry Reinforcement Learning with Language-Guided Exploration, Pivotal Credit, and Positive Amplification

R$^3$L improves reinforcement learning by synthesizing high-quality trajectories through a reflect-then-retry approach. This method enhances exploration and exploitation by using language feedback to correct errors and optimize training stability. The key result shows a 5% to 52% relative improvement over existing methods.

Novelty
8.0
Reliability
7.0
arxiv/2601.03715
PASS ✓
2025.12.22scaling

On the Koopman-Based Generalization Bounds for Multi-Task Deep Learning

This paper presents a new framework for establishing generalization bounds in multitask deep neural networks. By using operator-theoretic techniques and a tailored Sobolev space, the authors achieve tighter bounds that are effective even in single output scenarios. This approach enhances theoretical understanding and offers flexibility in multitask deep learning applications.

Novelty
8.0
Reliability
7.0
arxiv/2512.19199
PASS ✓
2025.12.22scaling

Operator-Based Generalization Bound for Deep Learning: Insights on Multi-Task Learning

This paper develops new generalization bounds for vector-valued neural networks, enhancing multi-task learning through a novel framework. The key result is the introduction of sketching techniques that improve computational efficiency while providing performance guarantees for various applications. This work significantly advances understanding of generalization in deep learning architectures.

Novelty
8.0
Reliability
7.0
arxiv/2512.19184
PASS ✓
2025.12.08vision

DFIR-DETR: Frequency-Domain Iterative Refinement and Dynamic Feature Aggregation for Small Object Detection

DFIR-DETR improves small object detection by addressing issues in attention mechanisms and feature upsampling. It achieves a mean Average Precision (mAP50) of 92.9% on NEU-DET and 51.6% on VisDrone with a compact model size of 11.7M parameters. This demonstrates effective performance across different detection scenarios.

Novelty
8.0
Reliability
7.0
arxiv/2512.07078
PASS ✓
2025.11.19infracode

DCC: Data-Centric Compilation of Machine Learning Kernels for Processing-In-Memory Architectures

DCC is a new ML compiler that optimizes data rearrangements and compute code for PIM devices, significantly improving performance. It achieves up to 13.17x speedup on specific PIM architectures compared to GPU-only execution, which is crucial for builders focused on maximizing efficiency in ML applications.

Novelty
8.0
Reliability
8.0
arxiv/2511.15503
PASS ✓
2025.09.08data

Are Targeted Data Poisoning Attacks as Effective as We Think?

The paper presents a novel approach to identify the easiest and hardest samples to poison in targeted data poisoning attacks. By leveraging clean model information, it enables better evaluation of attack effectiveness and proactive defenses against vulnerabilities. A key result is the reliable stratification of samples by poisoning vulnerability.

Novelty
8.0
Reliability
7.5
arxiv/2509.06896
PASS ✓
2025.08.19reasoningcode

Interactive Query Answering on Knowledge Graphs with Soft Entity Constraints

This paper presents a new approach to query answering in knowledge graphs that incorporates soft constraints, allowing users to express preferences. The key result is that the proposed methods maintain robust performance while adding minimal overhead, enabling more flexible interactions with graph databases.

Novelty
8.0
Reliability
7.0
arxiv/2508.13663
PASS ✓
2025.08.17datacode

STM3: Mixture of Multiscale Mamba for Long-Term Spatio-Temporal Time-Series Prediction

STM3 effectively captures complex long-term spatio-temporal dependencies using a unique architecture. It significantly outperforms the second-best model on the PEMSD8 dataset by 7.1% in MAE, showcasing its robustness in time-series prediction.

Novelty
8.0
Reliability
8.0
arxiv/2508.12247
PASS ✓
2025.06.25infra

Physics-Informed Machine Learning Regulated by Finite Element Analysis for Simulation Acceleration of Melt Pool Dynamics in Laser Powder Bed Fusion

The FEA-PINN framework significantly reduces computational costs while maintaining accuracy comparable to traditional FEA in simulating melt pool dynamics in LPBF. It effectively tracks material status during laser melting and incorporates various physical phenomena.

Novelty
8.0
Reliability
8.0
arxiv/2506.20537
PASS ✓
2024.12.19datacode

Speak-to-Structure: Evaluating LLMs in Open-domain Natural Language-Driven Molecule Generation

The paper presents S^2-Bench, a benchmark for evaluating LLMs in generating diverse molecular candidates from natural language prompts. It includes tasks that test molecule editing, optimization, and customization, demonstrating that Llama3.1-8B can outperform leading models like GPT-4o. This shift in focus enhances the capabilities of LLMs in molecular discovery.

Novelty
8.0
Reliability
7.0
arxiv/2412.14642
PASS ✓
2024.06.05data

Nonlinear Transformations Against Unlearnable Datasets

This research introduces a nonlinear transformation framework that allows deep neural networks to learn from data previously deemed unlearnable. The approach shows improvements in accuracy ranging from 0.34% to 249.59% on unlearnable CIFAR10 datasets, indicating that current protection methods may be insufficient.

Novelty
7.5
Reliability
7.0
arxiv/2406.02883