Project overview
Delfin-NG Solves the Resource GPU Crunch in Enterprise Environments
Delfin-NG is a distributed platform for the enterprise environment, which ties together scattered GPU resources from different Kubernetes clusters or standalone hosts and makes them available for GPU-based workloads.
Objectives
The key objectives of our Delfin-NG research is to:
- Incentivizing GPU resource owners to temporarily make them available while still preserving control over them.
- Leveraging under-utilized GPU resources in the enterprise across multiple geographical regions.
- Enable resource utilization that is open, available, and auditable for transparency.
- Develop a system, which is agnostic to GPU workload type and includes Machine Learning (ML), video processing, etc.
- Adherence to regional data and locality restrictions as well as regulatory constraints.
- Management of GPU resources with widely varying capabilities, such as architecture, number of cores, memory capacity, etc.
- Consideration for the varying network latencies among GPU nodes.
- The ability to trial various incentivization schemes.
Real world applications
Delfin-NG enables bundling islands of scattered GPU resources, making them available to the enterprise community. This provides a higher GPU utilization across the company, and thus, a better return on investment. This is especially useful for Large Language Models (LLMs) because it enables users to run large training jobs that could not otherwise be accommodated.