DECILE Libraries & Tools

DECILE provides a comprehensive suite of open-source libraries for data-efficient machine learning.

CORDS - Coreset & Data Subset Selection

Reduce training time from days to hours using state-of-the-art data subset selection algorithms.

GitHub Repository Documentation

Key Features:

  • GLISTER, GradMatch, CRAIG, and more
  • PyTorch integration
  • Supports supervised, semi-supervised, and self-supervised learning

DISTIL - Active Learning Library

Achieve high model performance with minimal labeled data through intelligent active learning.

GitHub Repository Documentation

Key Features:

  • State-of-the-art active learning strategies
  • Modular design for easy integration
  • Fast PyTorch implementations

SubmodLib - Submodular Optimization

Efficient submodular function optimization for data summarization and selection.

GitHub Repository Documentation

Key Features:

  • Multiple submodular functions (Facility Location, Graph Cut, etc.)
  • Optimized C++ backend with Python interface
  • Applications in video summarization, document summarization, and more

SPEAR - Data Programming

Reduce labeling costs through programmatic weak supervision and data programming.

GitHub Repository Documentation

Key Features:

  • Implements Snorkel, ImplyLoss, and other data programming approaches
  • Semi-supervised learning integration
  • Label aggregation and denoising

Publications using this software

  1. Double-Hit Gene Expression Signature Defines a Distinct Subgroup of Germinal Center B-Cell-Like Diffuse Large B-Cell Lymphoma
  2. Genome-wide discovery of somatic regulatory variants in diffuse large B-cell lymphoma
  3. Targeted error-suppressed quantification of circulating tumor DNA using semi-degenerate barcoded adapters and biotinylated baits
  4. Enhancing knowledge discovery from cancer genomics data with Galaxy

Data Sets

Contributors