Azure Blob Storage Integration
This guide covers configuring MLflow to use Azure Blob Storage for artifact storage. Azure Blob Storage provides scalable, durable storage for MLflow artifacts with enterprise-grade security and compliance features.
Azure Integration: Azure Blob Storage is ideal for organizations using Microsoft Azure, providing seamless integration with Azure services and enterprise security features.
Prerequisites
Azure Setup: Ensure you have proper Azure access and permissions before configuring Blob Storage integration.
- Azure subscription with Storage Account access
- Azure CLI configured or Azure credentials available
- Kubernetes cluster with MLflow deployed
- Storage Account with Blob service enabled
Azure Storage Account Setup
Storage Configuration: Proper storage account setup ensures optimal performance and cost management for your MLflow artifacts.
1. Create Storage Account
az storage account create \
--name yourmlflowstorage \
--resource-group your-resource-group \
--location eastus \
--sku Standard_LRS \
--kind StorageV2
2. Create Blob Container
az storage container create \
--account-name yourmlflowstorage \
--name mlflow-artifacts
3. Get Storage Account Key
STORAGE_KEY=$(az storage account keys list \
--account-name yourmlflowstorage \
--resource-group your-resource-group \
--query '[0].value' -o tsv)
Authentication Options
Security Best Practice: Use managed identities or service principals instead of storage account keys for production deployments.
Option 1: Storage Account Key (Development)
kubectl create secret generic azure-storage-credentials \
--namespace mlflow \
--from-literal=azure-storage-account-name=yourmlflowstorage \
--from-literal=azure-storage-account-key=$STORAGE_KEY
Development Use: Storage account keys are suitable for development and testing but not recommended for production.
Option 2: Service Principal (Production)
Create a service principal with Storage Blob Data Contributor role:
# Create service principal
az ad sp create-for-rbac \
--name mlflow-storage-sp \
--role "Storage Blob Data Contributor" \
--scopes /subscriptions/YOUR_SUBSCRIPTION_ID/resourceGroups/YOUR_RESOURCE_GROUP/providers/Microsoft.Storage/storageAccounts/yourmlflowstorage
# Create secret with service principal credentials
kubectl create secret generic azure-service-principal \
--namespace mlflow \
--from-literal=azure-client-id=YOUR_CLIENT_ID \
--from-literal=azure-client-secret=YOUR_CLIENT_SECRET \
--from-literal=azure-tenant-id=YOUR_TENANT_ID \
--from-literal=azure-subscription-id=YOUR_SUBSCRIPTION_ID
Option 3: Managed Identity (AKS)
If using AKS with managed identity:
# Enable managed identity on AKS cluster
az aks update \
--resource-group your-resource-group \
--name your-aks-cluster \
--enable-managed-identity
# Assign Storage Blob Data Contributor role to managed identity
az role assignment create \
--assignee-object-id $(az aks show -g your-resource-group -n your-aks-cluster --query identityProfile.kubeletidentity.objectId -o tsv) \
--role "Storage Blob Data Contributor" \
--scope /subscriptions/YOUR_SUBSCRIPTION_ID/resourceGroups/YOUR_RESOURCE_GROUP/providers/Microsoft.Storage/storageAccounts/yourmlflowstorage
AKS Best Practice: Managed identities provide the most secure and manageable way to grant Azure permissions to AKS pods.
MLflow Configuration
Complete Setup: This configuration demonstrates a production-ready MLflow setup with PostgreSQL backend and Azure Blob Storage.
Option 1: Using Storage Account Key
Create values.yaml
:
backendStore:
databaseMigration: true
postgres:
enabled: true
host: postgresql-instance1.cg034hpkmmjt.eu-central-1.rds.amazonaws.com
port: 5432
database: mlflow
user: mlflowuser
password: Pa33w0rd!
artifactRoot:
azureBlob:
enabled: true
container: mlflow-artifacts
storageAccount: yourmlflowstorage
path: "" # Optional: Azure blob container folder
accessKey: "" # Will use secret
connectionString: "" # Alternative to accessKey
extraEnvVars:
AZURE_STORAGE_ACCOUNT: yourmlflowstorage
AZURE_STORAGE_KEY: "" # Will use secret
MLFLOW_ARTIFACT_ROOT: "wasbs://mlflow-artifacts@yourmlflowstorage.blob.core.windows.net"
Option 2: Using Service Principal
artifactRoot:
azureBlob:
enabled: true
container: mlflow-artifacts
storageAccount: yourmlflowstorage
path: ""
extraEnvVars:
AZURE_CLIENT_ID: "" # Will use secret
AZURE_CLIENT_SECRET: "" # Will use secret
AZURE_TENANT_ID: "" # Will use secret
AZURE_SUBSCRIPTION_ID: "" # Will use secret
MLFLOW_ARTIFACT_ROOT: "wasbs://mlflow-artifacts@yourmlflowstorage.blob.core.windows.net"
Option 3: Using Helm Set Commands
helm install mlflow community-charts/mlflow \
--namespace mlflow \
--set backendStore.databaseMigration=true \
--set backendStore.postgres.enabled=true \
--set backendStore.postgres.host=postgresql-instance1.cg034hpkmmjt.eu-central-1.rds.amazonaws.com \
--set backendStore.postgres.database=mlflow \
--set backendStore.postgres.user=mlflowuser \
--set backendStore.postgres.password=Pa33w0rd! \
--set artifactRoot.azureBlob.enabled=true \
--set artifactRoot.azureBlob.container=mlflow-artifacts \
--set artifactRoot.azureBlob.storageAccount=yourmlflowstorage \
--set artifactRoot.azureBlob.accessKey=your-access-key
Command Line Security: Avoid passing sensitive credentials via command line arguments. Use values files or secrets instead.
AKS Integration with Managed Identity
If using AKS with managed identity, configure the service account:
AKS Integration: Managed identities eliminate the need to manage service principal credentials and provide automatic credential rotation.
serviceAccount:
create: true
name: mlflow
annotations:
azure.workload.identity/client-id: $(az aks show -g your-resource-group -n your-aks-cluster --query identityProfile.kubeletidentity.clientId -o tsv)
artifactRoot:
azureBlob:
enabled: true
container: mlflow-artifacts
storageAccount: yourmlflowstorage
path: ""
# No credentials needed with managed identity
extraEnvVars:
MLFLOW_ARTIFACT_ROOT: "wasbs://mlflow-artifacts@yourmlflowstorage.blob.core.windows.net"
Verification
Test Azure Storage Access
# Test from within the MLflow pod
kubectl exec -it deployment/mlflow -n mlflow -- \
az storage blob list --container-name mlflow-artifacts --account-name yourmlflowstorage
Check MLflow Logs
kubectl logs deployment/mlflow -n mlflow | grep -i "azure\|wasbs\|artifact"
Test Artifact Upload
Access MLflow UI and create an experiment with artifacts to verify Azure Blob Storage integration.
Advanced Configuration
Custom Endpoint
For Azure Stack or custom endpoints:
extraEnvVars:
AZURE_STORAGE_ENDPOINT: "https://your-custom-endpoint.com"
Azure Data Lake Storage Gen2
For ADLS Gen2 with hierarchical namespace:
artifactRoot:
azureBlob:
enabled: true
container: mlflow-artifacts
storageAccount: yourmlflowstorage
path: ""
extraEnvVars:
MLFLOW_ARTIFACT_ROOT: "abfss://mlflow-artifacts@yourmlflowstorage.dfs.core.windows.net"
Azure Storage Encryption
Enable customer-managed keys:
# Create key vault and key
az keyvault create --name your-keyvault --resource-group your-resource-group
az keyvault key create --vault-name your-keyvault --name storage-key --kty RSA --size 2048
# Configure storage account encryption
az storage account update \
--name yourmlflowstorage \
--resource-group your-resource-group \
--encryption-key-source Microsoft.Keyvault \
--encryption-key-vault https://your-keyvault.vault.azure.net/keys/storage-key
Custom Azure Blob Path
Store artifacts in a specific Azure Blob folder:
artifactRoot:
azureBlob:
enabled: true
container: mlflow-artifacts
storageAccount: yourmlflowstorage
path: mlflow/artifacts # Optional folder path
accessKey: your-access-key
Troubleshooting
Common Issues
- Authentication Failed: Check storage account key or service principal credentials
- Container Not Found: Ensure blob container exists and is accessible
- Network Issues: Check firewall rules and virtual network access
- Permission Denied: Verify role assignments for service principal or managed identity
- Connection String Issues: Verify connection string format and credentials
Debug Commands
# Check Azure credentials in pod
kubectl exec -it deployment/mlflow -n mlflow -- env | grep AZURE
# Test Azure CLI connectivity
kubectl exec -it deployment/mlflow -n mlflow -- \
az storage account show --name yourmlflowstorage
# Check MLflow Azure configuration
kubectl exec -it deployment/mlflow -n mlflow -- \
python -c "import mlflow; print(mlflow.get_artifact_uri())"
# Test Azure Blob access
kubectl exec -it deployment/mlflow -n mlflow -- \
az storage blob list --container-name mlflow-artifacts --account-name yourmlflowstorage
Cost Optimization
- Use Azure Storage lifecycle management for cost-effective storage
- Configure blob tiering (Hot, Cool, Archive)
- Monitor storage usage with Azure Monitor
- Use Azure Storage Analytics for usage insights
Security Best Practices
- Use managed identity instead of storage account keys when possible
- Enable Azure Storage encryption
- Configure network security with private endpoints
- Use Azure Key Vault for key management
- Regularly rotate storage account keys
- Enable soft delete for blob containers
Monitoring and Alerting
Storage Metrics
Enable storage analytics for monitoring:
az storage metrics update \
--account-name yourmlflowstorage \
--services blob \
--hour true \
--minute true
Next Steps
- Set up authentication for MLflow UI
- Configure autoscaling for high availability
- Set up Azure Monitor for comprehensive monitoring
- Configure backup and disaster recovery strategies
- Consider AWS S3 or Google Cloud Storage as alternatives