Welcome to Konduktor’s documentation!#


Batch Jobs and Cluster Management for GPUs on Kubernetes
Konduktor is a platform designed for running ML batch jobs and managing GPU clusters. This documentation is targeted towards:
ML Engineers/researchers trying to launch training jobs on Konduktor, either managed by Trainy or self-hosted
GPU cluster administrators trying to self-host Konduktor
For interest in our managed offering, please contact us at support@trainy.ai
Key Features#
🚀 Easily scale out and job queueing and multi-node scheduling
# create a request
$ sky launch -c dev task.yaml --num-nodes 100
☁ Multi-cloud access
# toggle cluster via region
$ sky launch -c dev task.yaml --region gke-cluster
Custom container support
# task.yaml
resources:
image_id: docker:nvcr.io/nvidia/pytorch:23.10-py3
run: |
python train.py
Managed Features and Roadmap#
On-prem/reserved support - Available ✅
GCP on-demand/spot support - Available ✅
AWS on-demand/spot support - In progress 🚧
Azure on-demand/spot support - In progress 🚧
Multi-cluster submission - In progress 🚧
Documentation#
Managed Konduktor
Job Scheduling
Self-hosted Cluster Administration
External Links#
This project is powered by: