UniPROT: Uniform Prototype Selection via Partial Optimal Transport with Submodular Guarantees

Abstract

Selecting prototypical examples from a source distribution to represent a target data distribution is a fundamental problem in machine learning. Existing subset selection methods often rely on implicit importance scores, which can be skewed towards majority classes and lead to low-quality prototypes for minority classes.

We present UniPROT, a subset selection framework that minimizes the optimal transport distance between a uniformly weighted prototypical distribution and the target distribution. While this formulation is natural, it yields a cardinality-constrained maximization of a super-additive objective that is generally hard to approximate efficiently.

UniPROT resolves this by reformulating the OT marginal constraints to obtain a partial optimal transport based submodular objective. This leads to a greedy algorithm with a $(1 - 1 / e)$ approximation guarantee relative to the original super-additive objective while remaining scalable in practice.

Across imbalanced classification benchmarks and large language model finetuning and pretraining settings under domain imbalance, UniPROT consistently improves minority-class representation without sacrificing majority-class accuracy.

Highlights

Introduces a uniform-weighted prototype selection objective grounded in optimal transport.
Transforms an intractable super-additive maximization problem into a submodular one via partial OT constraints.
Provides a greedy approximation guarantee of $(1 - 1 / e)$ .
Improves representation of minority classes in imbalanced settings.
Demonstrates gains for both classical benchmarks and LLM data selection regimes.

BibTeX

@article{chanda2026uniprot,
  title={UniPROT: Uniform Prototype Selection via Partial Optimal Transport with Submodular Guarantees},
  author={Chanda, Prateek and Agrawal, Prayas and Gurumoorthy, Karthik S. and Ramakrishnan, Ganesh and Mishra, Bamdev and Jawanpuria, Pratik},
  journal={arXiv preprint arXiv:2604.10952},
  year={2026},
  url={https://arxiv.org/abs/2604.10952}
}

UniPROT: Uniform Prototype Selection via Partial Optimal Transport with Submodular Guarantees

Abstract

Highlights

Links

BibTeX