Skip to main content

Azure Blob Storage Integration

This guide covers configuring MLflow to use Azure Blob Storage for artifact storage. Azure Blob Storage provides scalable, durable storage for MLflow artifacts with enterprise-grade security and compliance features.

info

Azure Integration: Azure Blob Storage is ideal for organizations using Microsoft Azure, providing seamless integration with Azure services and enterprise security features.

Prerequisites

warning

Azure Setup: Ensure you have proper Azure access and permissions before configuring Blob Storage integration.

  • Azure subscription with Storage Account access
  • Azure CLI configured or Azure credentials available
  • Kubernetes cluster with MLflow deployed
  • Storage Account with Blob service enabled

Azure Storage Account Setup

tip

Storage Configuration: Proper storage account setup ensures optimal performance and cost management for your MLflow artifacts.

1. Create Storage Account

az storage account create \
--name yourmlflowstorage \
--resource-group your-resource-group \
--location eastus \
--sku Standard_LRS \
--kind StorageV2

2. Create Blob Container

az storage container create \
--account-name yourmlflowstorage \
--name mlflow-artifacts

3. Get Storage Account Key

STORAGE_KEY=$(az storage account keys list \
--account-name yourmlflowstorage \
--resource-group your-resource-group \
--query '[0].value' -o tsv)

Authentication Options

warning

Security Best Practice: Use managed identities or service principals instead of storage account keys for production deployments.

Option 1: Storage Account Key (Development)

kubectl create secret generic azure-storage-credentials \
--namespace mlflow \
--from-literal=azure-storage-account-name=yourmlflowstorage \
--from-literal=azure-storage-account-key=$STORAGE_KEY
tip

Development Use: Storage account keys are suitable for development and testing but not recommended for production.

Option 2: Service Principal (Production)

Create a service principal with Storage Blob Data Contributor role:

# Create service principal
az ad sp create-for-rbac \
--name mlflow-storage-sp \
--role "Storage Blob Data Contributor" \
--scopes /subscriptions/YOUR_SUBSCRIPTION_ID/resourceGroups/YOUR_RESOURCE_GROUP/providers/Microsoft.Storage/storageAccounts/yourmlflowstorage

# Create secret with service principal credentials
kubectl create secret generic azure-service-principal \
--namespace mlflow \
--from-literal=azure-client-id=YOUR_CLIENT_ID \
--from-literal=azure-client-secret=YOUR_CLIENT_SECRET \
--from-literal=azure-tenant-id=YOUR_TENANT_ID \
--from-literal=azure-subscription-id=YOUR_SUBSCRIPTION_ID

Option 3: Managed Identity (AKS)

If using AKS with managed identity:

# Enable managed identity on AKS cluster
az aks update \
--resource-group your-resource-group \
--name your-aks-cluster \
--enable-managed-identity

# Assign Storage Blob Data Contributor role to managed identity
az role assignment create \
--assignee-object-id $(az aks show -g your-resource-group -n your-aks-cluster --query identityProfile.kubeletidentity.objectId -o tsv) \
--role "Storage Blob Data Contributor" \
--scope /subscriptions/YOUR_SUBSCRIPTION_ID/resourceGroups/YOUR_RESOURCE_GROUP/providers/Microsoft.Storage/storageAccounts/yourmlflowstorage
info

AKS Best Practice: Managed identities provide the most secure and manageable way to grant Azure permissions to AKS pods.

MLflow Configuration

info

Complete Setup: This configuration demonstrates a production-ready MLflow setup with PostgreSQL backend and Azure Blob Storage.

Option 1: Using Storage Account Key

Create values.yaml:

backendStore:
databaseMigration: true
postgres:
enabled: true
host: postgresql-instance1.cg034hpkmmjt.eu-central-1.rds.amazonaws.com
port: 5432
database: mlflow
user: mlflowuser
password: Pa33w0rd!

artifactRoot:
azureBlob:
enabled: true
container: mlflow-artifacts
storageAccount: yourmlflowstorage
path: "" # Optional: Azure blob container folder
accessKey: "" # Will use secret
connectionString: "" # Alternative to accessKey

extraEnvVars:
AZURE_STORAGE_ACCOUNT: yourmlflowstorage
AZURE_STORAGE_KEY: "" # Will use secret
MLFLOW_ARTIFACT_ROOT: "wasbs://mlflow-artifacts@yourmlflowstorage.blob.core.windows.net"

Option 2: Using Service Principal

artifactRoot:
azureBlob:
enabled: true
container: mlflow-artifacts
storageAccount: yourmlflowstorage
path: ""

extraEnvVars:
AZURE_CLIENT_ID: "" # Will use secret
AZURE_CLIENT_SECRET: "" # Will use secret
AZURE_TENANT_ID: "" # Will use secret
AZURE_SUBSCRIPTION_ID: "" # Will use secret
MLFLOW_ARTIFACT_ROOT: "wasbs://mlflow-artifacts@yourmlflowstorage.blob.core.windows.net"

Option 3: Using Helm Set Commands

helm install mlflow community-charts/mlflow \
--namespace mlflow \
--set backendStore.databaseMigration=true \
--set backendStore.postgres.enabled=true \
--set backendStore.postgres.host=postgresql-instance1.cg034hpkmmjt.eu-central-1.rds.amazonaws.com \
--set backendStore.postgres.database=mlflow \
--set backendStore.postgres.user=mlflowuser \
--set backendStore.postgres.password=Pa33w0rd! \
--set artifactRoot.azureBlob.enabled=true \
--set artifactRoot.azureBlob.container=mlflow-artifacts \
--set artifactRoot.azureBlob.storageAccount=yourmlflowstorage \
--set artifactRoot.azureBlob.accessKey=your-access-key
warning

Command Line Security: Avoid passing sensitive credentials via command line arguments. Use values files or secrets instead.

AKS Integration with Managed Identity

If using AKS with managed identity, configure the service account:

tip

AKS Integration: Managed identities eliminate the need to manage service principal credentials and provide automatic credential rotation.

serviceAccount:
create: true
name: mlflow
annotations:
azure.workload.identity/client-id: $(az aks show -g your-resource-group -n your-aks-cluster --query identityProfile.kubeletidentity.clientId -o tsv)

artifactRoot:
azureBlob:
enabled: true
container: mlflow-artifacts
storageAccount: yourmlflowstorage
path: ""
# No credentials needed with managed identity

extraEnvVars:
MLFLOW_ARTIFACT_ROOT: "wasbs://mlflow-artifacts@yourmlflowstorage.blob.core.windows.net"

Verification

Test Azure Storage Access

# Test from within the MLflow pod
kubectl exec -it deployment/mlflow -n mlflow -- \
az storage blob list --container-name mlflow-artifacts --account-name yourmlflowstorage

Check MLflow Logs

kubectl logs deployment/mlflow -n mlflow | grep -i "azure\|wasbs\|artifact"

Test Artifact Upload

Access MLflow UI and create an experiment with artifacts to verify Azure Blob Storage integration.

Advanced Configuration

Custom Endpoint

For Azure Stack or custom endpoints:

extraEnvVars:
AZURE_STORAGE_ENDPOINT: "https://your-custom-endpoint.com"

Azure Data Lake Storage Gen2

For ADLS Gen2 with hierarchical namespace:

artifactRoot:
azureBlob:
enabled: true
container: mlflow-artifacts
storageAccount: yourmlflowstorage
path: ""

extraEnvVars:
MLFLOW_ARTIFACT_ROOT: "abfss://mlflow-artifacts@yourmlflowstorage.dfs.core.windows.net"

Azure Storage Encryption

Enable customer-managed keys:

# Create key vault and key
az keyvault create --name your-keyvault --resource-group your-resource-group
az keyvault key create --vault-name your-keyvault --name storage-key --kty RSA --size 2048

# Configure storage account encryption
az storage account update \
--name yourmlflowstorage \
--resource-group your-resource-group \
--encryption-key-source Microsoft.Keyvault \
--encryption-key-vault https://your-keyvault.vault.azure.net/keys/storage-key

Custom Azure Blob Path

Store artifacts in a specific Azure Blob folder:

artifactRoot:
azureBlob:
enabled: true
container: mlflow-artifacts
storageAccount: yourmlflowstorage
path: mlflow/artifacts # Optional folder path
accessKey: your-access-key

Troubleshooting

Common Issues

  1. Authentication Failed: Check storage account key or service principal credentials
  2. Container Not Found: Ensure blob container exists and is accessible
  3. Network Issues: Check firewall rules and virtual network access
  4. Permission Denied: Verify role assignments for service principal or managed identity
  5. Connection String Issues: Verify connection string format and credentials

Debug Commands

# Check Azure credentials in pod
kubectl exec -it deployment/mlflow -n mlflow -- env | grep AZURE

# Test Azure CLI connectivity
kubectl exec -it deployment/mlflow -n mlflow -- \
az storage account show --name yourmlflowstorage

# Check MLflow Azure configuration
kubectl exec -it deployment/mlflow -n mlflow -- \
python -c "import mlflow; print(mlflow.get_artifact_uri())"

# Test Azure Blob access
kubectl exec -it deployment/mlflow -n mlflow -- \
az storage blob list --container-name mlflow-artifacts --account-name yourmlflowstorage

Cost Optimization

  • Use Azure Storage lifecycle management for cost-effective storage
  • Configure blob tiering (Hot, Cool, Archive)
  • Monitor storage usage with Azure Monitor
  • Use Azure Storage Analytics for usage insights

Security Best Practices

  • Use managed identity instead of storage account keys when possible
  • Enable Azure Storage encryption
  • Configure network security with private endpoints
  • Use Azure Key Vault for key management
  • Regularly rotate storage account keys
  • Enable soft delete for blob containers

Monitoring and Alerting

Storage Metrics

Enable storage analytics for monitoring:

az storage metrics update \
--account-name yourmlflowstorage \
--services blob \
--hour true \
--minute true

Next Steps