Efficiency Learning Project Page

AISTATS 2026 Poster

UniPROT: Uniform Prototype Selection via Partial Optimal Transport with Submodular Guarantees

Prateek Chanda, Prayas Agrawal, Karthik S. Gurumoorthy, Ganesh Ramakrishnan, Bamdev Mishra, Pratik Jawanpuria

A theoretically grounded prototype selection framework that enforces uniform source contributions and improves minority-class representation under imbalance.

Venue AISTATS 2026
Track Poster
Focus Prototype Selection
Core Idea Partial OT + Submodularity

Abstract

Selecting prototypical examples from a source distribution to represent a target data distribution is a fundamental problem in machine learning. Existing subset selection methods often rely on implicit importance scores, which can be skewed towards majority classes and lead to low-quality prototypes for minority classes.

We present UniPROT, a subset selection framework that minimizes the optimal transport distance between a uniformly weighted prototypical distribution and the target distribution. While this formulation is natural, it yields a cardinality-constrained maximization of a super-additive objective that is generally hard to approximate efficiently.

UniPROT resolves this by reformulating the OT marginal constraints to obtain a partial optimal transport based submodular objective. This leads to a greedy algorithm with a (1 - 1 / e) approximation guarantee relative to the original super-additive objective while remaining scalable in practice.

Across imbalanced classification benchmarks and large language model finetuning and pretraining settings under domain imbalance, UniPROT consistently improves minority-class representation without sacrificing majority-class accuracy.

Highlights

  • Introduces a uniform-weighted prototype selection objective grounded in optimal transport.
  • Transforms an intractable super-additive maximization problem into a submodular one via partial OT constraints.
  • Provides a greedy approximation guarantee of (1 - 1 / e).
  • Improves representation of minority classes in imbalanced settings.
  • Demonstrates gains for both classical benchmarks and LLM data selection regimes.

BibTeX

@article{chanda2026uniprot,
  title={UniPROT: Uniform Prototype Selection via Partial Optimal Transport with Submodular Guarantees},
  author={Chanda, Prateek and Agrawal, Prayas and Gurumoorthy, Karthik S. and Ramakrishnan, Ganesh and Mishra, Bamdev and Jawanpuria, Pratik},
  journal={arXiv preprint arXiv:2604.10952},
  year={2026},
  url={https://arxiv.org/abs/2604.10952}
}