Software
DECILE Libraries & Tools
DECILE provides a comprehensive suite of open-source libraries for data-efficient machine learning.
CORDS - Coreset & Data Subset Selection
Reduce training time from days to hours using state-of-the-art data subset selection algorithms.
| GitHub Repository | Documentation |
Key Features:
- GLISTER, GradMatch, CRAIG, and more
- PyTorch integration
- Supports supervised, semi-supervised, and self-supervised learning
DISTIL - Active Learning Library
Achieve high model performance with minimal labeled data through intelligent active learning.
| GitHub Repository | Documentation |
Key Features:
- State-of-the-art active learning strategies
- Modular design for easy integration
- Fast PyTorch implementations
SubmodLib - Submodular Optimization
Efficient submodular function optimization for data summarization and selection.
| GitHub Repository | Documentation |
Key Features:
- Multiple submodular functions (Facility Location, Graph Cut, etc.)
- Optimized C++ backend with Python interface
- Applications in video summarization, document summarization, and more
SPEAR - Data Programming
Reduce labeling costs through programmatic weak supervision and data programming.
| GitHub Repository | Documentation |
Key Features:
- Implements Snorkel, ImplyLoss, and other data programming approaches
- Semi-supervised learning integration
- Label aggregation and denoising
Publications using this software
- Double-Hit Gene Expression Signature Defines a Distinct Subgroup of Germinal Center B-Cell-Like Diffuse Large B-Cell Lymphoma
- Genome-wide discovery of somatic regulatory variants in diffuse large B-cell lymphoma
- Targeted error-suppressed quantification of circulating tumor DNA using semi-degenerate barcoded adapters and biotinylated baits
- Enhancing knowledge discovery from cancer genomics data with Galaxy
Data Sets
Contributors
- Chris Rushton
- Bruno Grande
- Prasath Pararajalingam
- Aixiang Jiang
- Marco Albuquerque
- Elie Ritch
- Selin Jessa