Autoscaling Setup

This guide covers configuring autoscaling for MLflow to handle varying workloads efficiently. We'll cover Horizontal Pod Autoscaler (HPA), and cluster autoscaling strategies.

info

Production Scaling: Autoscaling is essential for production MLflow deployments to handle varying workloads and ensure optimal resource utilization.

Prerequisites

warning

Critical Requirements: Autoscaling requires specific backend configurations. Ensure all prerequisites are met before enabling autoscaling.

Kubernetes cluster with metrics server enabled
MLflow deployed on Kubernetes
Cluster autoscaler configured (for node scaling)
Resource requests and limits defined for MLflow pods
Backend store enabled (PostgreSQL or MySQL, not SQLite)
Artifact store enabled (S3, Azure Blob, or GCS)
Authentication configured (PostgreSQL auth backend or disabled)

tip

Resource Planning: Define appropriate resource requests and limits for your MLflow pods to enable effective autoscaling decisions.

Horizontal Pod Autoscaler (HPA)

Prerequisites for HPA

warning

HPA Requirements: The HPA will only be created if all these conditions are met. Check your configuration carefully.

The HPA is created only if:

autoscaling.enabled: true
A backend store is enabled (backendStore.postgres.enabled or backendStore.mysql.enabled)
An artifact store is enabled (artifactRoot.azureBlob.enabled, artifactRoot.s3.enabled, or artifactRoot.gcs.enabled)
Auth is either enabled with Postgres (auth.enabled and auth.postgres.enabled) or disabled (auth.enabled: false)

1. Basic HPA Configuration

Create values-autoscaling.yaml:

backendStore:
  databaseMigration: true
  databaseConnectionCheck: true
  postgres:
    enabled: true
    host: postgresql-instance1.cg034hpkmmjt.eu-central-1.rds.amazonaws.com
    port: 5432
    database: mlflow
    user: mlflowuser
    password: Pa33w0rd!

artifactRoot:
  s3:
    enabled: true
    bucket: my-mlflow-artifact-root-backend
    awsAccessKeyId: a1b2c3d4
    awsSecretAccessKey: a1b2c3d4

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

2. Deploy with Autoscaling

helm install mlflow community-charts/mlflow \
  --namespace mlflow \
  -f values-autoscaling.yaml

3. Custom Scaling Behavior

For more sophisticated scaling behavior (Kubernetes 1.18+):

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 15
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
        - type: Percent
          value: 100
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60

tip

Scaling Behavior: Customize scaling behavior to prevent rapid scaling up/down and ensure stable performance during workload changes.

4. Custom Metrics HPA

For more sophisticated scaling based on custom metrics:

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 15
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Object
      object:
        metric:
          name: requests-per-second
        describedObject:
          apiVersion: networking.k8s.io/v1
          kind: Ingress
          name: mlflow-ingress
        target:
          type: AverageValue
          averageValue: 100

info

Custom Metrics: Use custom metrics for more precise scaling decisions based on application-specific workload indicators.

Complete Production Configuration

info

Production Setup: This example demonstrates a complete production-ready MLflow configuration with autoscaling enabled.

PostgreSQL with S3 and Autoscaling

backendStore:
  databaseMigration: true
  databaseConnectionCheck: true
  postgres:
    enabled: true
    host: postgresql-instance1.cg034hpkmmjt.eu-central-1.rds.amazonaws.com
    port: 5432
    database: mlflow
    user: mlflowuser
    password: Pa33w0rd!

artifactRoot:
  s3:
    enabled: true
    bucket: my-mlflow-artifact-root-backend
    awsAccessKeyId: a1b2c3d4
    awsSecretAccessKey: a1b2c3d4

auth:
  enabled: true
  adminUsername: admin
  adminPassword: S3cr3+
  postgres:
    enabled: true
    host: postgresql--auth-instance1.abcdef1234.eu-central-1.rds.amazonaws.com
    port: 5432
    database: auth
    user: mlflowauth
    password: A4m1nPa33w0rd!

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
        - type: Percent
          value: 100
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60

# Resource management for autoscaling
resources:
  requests:
    cpu: 500m
    memory: 1Gi
  limits:
    cpu: 2
    memory: 4Gi

# Health probes for reliability
livenessProbe:
  initialDelaySeconds: 30
  periodSeconds: 20
  timeoutSeconds: 6
  failureThreshold: 3

readinessProbe:
  initialDelaySeconds: 30
  periodSeconds: 20
  timeoutSeconds: 6
  failureThreshold: 3

# Monitoring integration
serviceMonitor:
  enabled: true
  namespace: monitoring
  interval: 30s
  telemetryPath: /metrics
  labels:
    release: prometheus
  timeout: 10s
  targetLabels: []

MinIO with MySQL and Autoscaling

backendStore:
  databaseMigration: true
  databaseConnectionCheck: true
  mysql:
    enabled: true
    host: mysql-service
    port: 3306
    database: mlflow
    user: mlflow
    password: mlflow

artifactRoot:
  s3:
    enabled: true
    bucket: mlflow
    awsAccessKeyId: minioadmin
    awsSecretAccessKey: minioadmin

extraEnvVars:
  MLFLOW_S3_ENDPOINT_URL: http://minio-service:9000

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 8
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 75
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 85

resources:
  requests:
    cpu: 300m
    memory: 512Mi
  limits:
    cpu: 1
    memory: 2Gi

Cluster Autoscaling

1. AWS EKS Cluster Autoscaler

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    app: cluster-autoscaler
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      serviceAccountName: cluster-autoscaler
      containers:
      - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.24.0
        name: cluster-autoscaler
        resources:
          limits:
            cpu: 100m
            memory: 300Mi
          requests:
            cpu: 100m
            memory: 300Mi
        command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=aws
        - --skip-nodes-with-local-storage=false
        - --expander=least-waste
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/your-cluster-name
        - --balance-similar-node-groups
        - --skip-nodes-with-system-pods=false
        volumeMounts:
        - name: ssl-certs
          mountPath: /etc/ssl/certs/ca-bundle.crt
          readOnly: true
      volumes:
      - name: ssl-certs
        hostPath:
          path: "/etc/ssl/certs/ca-bundle.crt"

2. GKE Cluster Autoscaling

Enable cluster autoscaling in GKE:

gcloud container clusters update your-cluster \
  --enable-autoscaling \
  --min-nodes 1 \
  --max-nodes 10 \
  --zone your-zone

MLflow-Specific Autoscaling Configuration

1. Resource Requirements

Update MLflow deployment with proper resource requests and limits:

resources:
  requests:
    cpu: 500m
    memory: 1Gi
  limits:
    cpu: 2
    memory: 4Gi

2. Pod Disruption Budget

Create a Pod Disruption Budget for high availability:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: mlflow-pdb
  namespace: mlflow
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: mlflow

3. Advanced HPA with Custom Metrics

Using Prometheus metrics for MLflow-specific scaling:

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: External
      external:
        metric:
          name: mlflow_active_experiments
          selector:
            matchLabels:
              app: mlflow
        target:
          type: AverageValue
          averageValue: 10

Monitoring and Alerting

1. HPA Metrics Monitoring

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-rules
  namespace: monitoring
data:
  mlflow-hpa-rules.yaml: |
    groups:
    - name: mlflow-hpa
      rules:
      - alert: MLflowHPAScaling
        expr: kube_horizontalpodautoscaler_status_current_replicas > 0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "MLflow HPA is scaling"
          description: "MLflow HPA has {{ $value }} replicas"

2. Resource Usage Alerts

- alert: MLflowHighCPU
  expr: container_cpu_usage_seconds_total{container="mlflow"} > 0.8
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "MLflow high CPU usage"
    description: "MLflow container CPU usage is {{ $value }}"

- alert: MLflowHighMemory
  expr: container_memory_usage_bytes{container="mlflow"} > 3.5e9
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "MLflow high memory usage"
    description: "MLflow container memory usage is {{ $value }}"

Performance Optimization

1. Pod Anti-Affinity

Ensure MLflow pods are distributed across nodes:

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - mlflow
        topologyKey: kubernetes.io/hostname

2. Resource Quotas

Set resource quotas for the MLflow namespace:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: mlflow-quota
  namespace: mlflow
spec:
  hard:
    requests.cpu: "4"
    requests.memory: 8Gi
    limits.cpu: "8"
    limits.memory: 16Gi
    pods: "10"

3. Network Policies

Restrict network access for security:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: mlflow-network-policy
  namespace: mlflow
spec:
  podSelector:
    matchLabels:
      app: mlflow
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: ingress-nginx
    ports:
    - protocol: TCP
      port: 5000
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          name: database
    ports:
    - protocol: TCP
      port: 5432

Troubleshooting

Common Issues

HPA not scaling: Check metrics server and resource requests/limits
Cluster autoscaler issues: Verify node groups and IAM permissions
Resource starvation: Monitor cluster capacity and quotas
Prerequisites not met: Ensure backend store, artifact store, and auth are properly configured

Debug Commands

# Check HPA status
kubectl get hpa -n mlflow
kubectl describe hpa mlflow-hpa -n mlflow

# Check cluster autoscaler logs
kubectl logs -f deployment/cluster-autoscaler -n kube-system

# Check resource usage
kubectl top pods -n mlflow
kubectl top nodes

# Check metrics server
kubectl get apiservice v1beta1.metrics.k8s.io

# Verify prerequisites
kubectl get configmap -n mlflow mlflow -o yaml | grep -A 10 -B 10 "backend\|artifact\|auth"

Best Practices

1. Gradual Scaling

Use appropriate stabilization windows
Implement gradual scale-down policies
Monitor scaling behavior and adjust thresholds

2. Resource Management

Set realistic resource requests and limits
Monitor and adjust based on actual usage patterns

3. Cost Optimization

Set appropriate min/max replica counts
Use spot instances where possible
Monitor and optimize resource utilization

4. High Availability

Deploy across multiple availability zones
Use pod disruption budgets
Implement proper health checks and readiness probes

Next Steps

Set up comprehensive monitoring with Prometheus and Grafana
Configure alerting for autoscaling events
Implement cost monitoring and optimization
Set up backup and disaster recovery strategies
Configure ServiceMonitor for metrics collection

Prerequisites​

Horizontal Pod Autoscaler (HPA)​

Prerequisites for HPA​

1. Basic HPA Configuration​

2. Deploy with Autoscaling​

3. Custom Scaling Behavior​

4. Custom Metrics HPA​

Complete Production Configuration​

PostgreSQL with S3 and Autoscaling​

MinIO with MySQL and Autoscaling​

Cluster Autoscaling​

1. AWS EKS Cluster Autoscaler​

2. GKE Cluster Autoscaling​

MLflow-Specific Autoscaling Configuration​

1. Resource Requirements​

2. Pod Disruption Budget​

3. Advanced HPA with Custom Metrics​

Monitoring and Alerting​

1. HPA Metrics Monitoring​

2. Resource Usage Alerts​

Performance Optimization​

1. Pod Anti-Affinity​

2. Resource Quotas​

3. Network Policies​

Troubleshooting​

Common Issues​

Debug Commands​

Best Practices​

1. Gradual Scaling​

2. Resource Management​

3. Cost Optimization​

4. High Availability​

Next Steps​

Prerequisites

Horizontal Pod Autoscaler (HPA)

Prerequisites for HPA

1. Basic HPA Configuration

2. Deploy with Autoscaling

3. Custom Scaling Behavior

4. Custom Metrics HPA

Complete Production Configuration

PostgreSQL with S3 and Autoscaling

MinIO with MySQL and Autoscaling

Cluster Autoscaling

1. AWS EKS Cluster Autoscaler

2. GKE Cluster Autoscaling

MLflow-Specific Autoscaling Configuration

1. Resource Requirements

2. Pod Disruption Budget

3. Advanced HPA with Custom Metrics

Monitoring and Alerting

1. HPA Metrics Monitoring

2. Resource Usage Alerts

Performance Optimization

1. Pod Anti-Affinity

2. Resource Quotas

3. Network Policies

Troubleshooting

Common Issues

Debug Commands

Best Practices

1. Gradual Scaling

2. Resource Management

3. Cost Optimization

4. High Availability

Next Steps