← Back to feed
2026-02-13data

Linear Regression with Unknown Truncation Beyond Gaussian Features

Alexandros Kouridakis, Anay Mehrotra, Alkis Kalavasis, Constantine Caramanis

Key claim

First efficient algorithm for unknown survival sets in regression.

This paper presents a novel algorithm for truncated linear regression that operates efficiently even when the survival set is unknown. It achieves a polynomial runtime with respect to the number of dimensions and desired accuracy, making it more practical for real-world applications. The approach also contributes to positive-only PAC learning, which could be beneficial for future research.

Novelty
8.0/10

The algorithm introduces a new approach for learning unknown survival sets in truncated linear regression.

Reliability
7.0/10

The methodology is solid, but relies on specific assumptions about feature distributions.

Deep reliability assessment

The methodology provides a polynomial-time algorithm for truncated linear regression with unknown survival sets, which is a significant advancement over previous methods that required strong assumptions about the feature distribution. However, the claim that it operates under only mild assumptions may overlook the complexities involved in real-world applications where data may not meet these assumptions.

Reproducibility

Yes, the authors mention that the simulation code can be found in a repository, although specific details about the dataset used are not provided.

Discussion questions

  1. How would the results change if the underlying distribution of the feature vectors deviates significantly from sub-Gaussianity?
  2. What are the potential real-world scenarios where this algorithm could be applied, and what challenges might arise in those contexts?
  3. If the survival set were to be estimated inaccurately, what specific outcomes or metrics would indicate a failure of the algorithm?

Key figure

Figure 1 illustrates the process of learning a survival set from positive samples, demonstrating how the algorithm discards intervals with the most samples from a reference distribution.

Read on arXiv →
Linear Regression with Unknown Truncation Beyond Gaussian Features — Frontier Papers