Welcome to Konduktor’s documentation!

Welcome to Konduktor’s documentation!#

Trainy
Trainy

Star

Batch Jobs and Cluster Management for GPUs on Kubernetes

Konduktor is a platform designed for running ML batch jobs and managing GPU clusters. Konduktor uses existing open source tools to build a platform that empowers ML engineers by abstracting away the details of resource scheduling so they can focus on modeling. Cluster administrators will enjoy setting resource quotas and sharing between projects as well as built in monitoring to track cluster-wide resource utilization and pending jobs to adjust quotas according to organizational priorities, reduce resource idling, and observe cluster GPU and fabric health.
  • Easy scale out and job queueing and multi-node scheduling

  • Share resources with quotas across projects via namespaces

  • Track active and pending jobs and utilization, power usage, etc.

  • Node level metrics for monitoring cluster health

Trainy

Documentation#