Troubleshooting

This page covers common issues you may encounter when deploying and running PyPI Server on Kubernetes, along with debugging commands and solutions.

Common Issues

1. Pod Startup Issues

Pod Stuck in Pending State

Symptoms:

Pod remains in Pending state
No events or error messages

Debug Commands:

# Check pod status
kubectl get pods -l app.kubernetes.io/name=pypiserver

# Check pod events
kubectl describe pod <pod-name>

# Check node resources
kubectl describe nodes

# Check PVC status
kubectl get pvc
kubectl describe pvc pypi-packages-pvc

Solutions:

Ensure PVC exists and is bound
Check node resources and taints
Verify storage class is available
Check resource requests/limits

Pod CrashLoopBackOff

Symptoms:

Pod repeatedly crashes and restarts
Status shows CrashLoopBackOff

Debug Commands:

# Check pod logs
kubectl logs <pod-name> --previous

# Check pod events
kubectl describe pod <pod-name>

# Check container status
kubectl get pods <pod-name> -o yaml

Common Causes:

Incorrect volume mounts
Permission issues
Missing configuration files
Resource constraints

2. Storage Issues

Permission Denied Errors

Symptoms:

Permission denied in logs
Cannot write to package directory

Debug Commands:

# Check volume mounts
kubectl exec -it <pod-name> -- ls -la /data/

# Check file permissions
kubectl exec -it <pod-name> -- ls -la /data/packages/

# Check security context
kubectl get pod <pod-name> -o yaml | grep -A 10 securityContext

Solutions:

# Fix security context
podSecurityContext:
  runAsUser: 1000
  runAsGroup: 3000
  fsGroup: 2000
  fsGroupChangePolicy: OnRootMismatch

securityContext:
  runAsNonRoot: true
  runAsUser: 1000

Storage Full

Symptoms:

No space left on device errors
Cannot upload new packages

Debug Commands:

# Check storage usage
kubectl exec -it <pod-name> -- df -h

# Check package directory size
kubectl exec -it <pod-name> -- du -sh /data/packages/

# Check PVC capacity
kubectl get pvc pypi-packages-pvc

Solutions:

Increase PVC size
Clean up old packages
Implement storage monitoring

3. Network Issues

Service Not Accessible

Symptoms:

Cannot access PyPI Server from outside cluster
Connection refused errors

Debug Commands:

# Check service status
kubectl get svc -l app.kubernetes.io/name=pypiserver

# Check endpoints
kubectl get endpoints -l app.kubernetes.io/name=pypiserver

# Test service connectivity
kubectl run test-pod --image=busybox --rm -it --restart=Never -- wget -O- http://<service-name>:8080

# Check ingress status
kubectl get ingress
kubectl describe ingress <ingress-name>

Solutions:

Verify service configuration
Check ingress controller
Ensure port mappings are correct

Ingress Issues

Symptoms:

Ingress not routing traffic
SSL/TLS certificate issues

Debug Commands:

# Check ingress controller logs
kubectl logs -n ingress-nginx deployment/ingress-nginx-controller

# Check ingress events
kubectl describe ingress <ingress-name>

# Test ingress connectivity
curl -I https://pypi.yourdomain.com

Solutions:

Verify ingress annotations
Check certificate configuration
Ensure DNS resolution

4. Authentication Issues

Password File Problems

Symptoms:

Authentication failures
Cannot access protected packages

Debug Commands:

# Check password file
kubectl exec -it <pod-name> -- cat /data/.htpasswd

# Check secret
kubectl get secret pypi-auth -o yaml

# Test authentication
curl -u username:password https://pypi.yourdomain.com/simple/

Solutions:

Verify password file format
Check secret configuration
Ensure proper volume mounting

5. Performance Issues

Slow Package Uploads/Downloads

Symptoms:

Long response times
Timeout errors

Debug Commands:

# Check resource usage
kubectl top pod <pod-name>

# Check pod metrics
kubectl exec -it <pod-name> -- cat /proc/cpuinfo
kubectl exec -it <pod-name> -- cat /proc/meminfo

# Check network performance
kubectl exec -it <pod-name> -- ping -c 3 google.com

Solutions:

Increase resource limits
Use SSD storage
Optimize server configuration
Add more replicas

Debugging Commands

General Debugging

# Get all resources
kubectl get all -l app.kubernetes.io/name=pypiserver

# Check events
kubectl get events --sort-by='.lastTimestamp'

# Check logs with timestamps
kubectl logs <pod-name> --timestamps

# Follow logs in real-time
kubectl logs <pod-name> -f

# Check pod details
kubectl describe pod <pod-name>

# Check service details
kubectl describe svc <service-name>

Storage Debugging

# Check PVC status
kubectl get pvc
kubectl describe pvc pypi-packages-pvc

# Check storage usage
kubectl exec -it <pod-name> -- df -h
kubectl exec -it <pod-name> -- du -sh /data/packages/

# Check file permissions
kubectl exec -it <pod-name> -- ls -la /data/
kubectl exec -it <pod-name> -- id

# Check volume mounts
kubectl get pod <pod-name> -o yaml | grep -A 20 volumeMounts

Network Debugging

# Check service endpoints
kubectl get endpoints <service-name>

# Test service connectivity
kubectl run test-pod --image=busybox --rm -it --restart=Never -- wget -O- http://<service-name>:8080

# Check ingress status
kubectl get ingress
kubectl describe ingress <ingress-name>

# Test external connectivity
curl -I http://<service-ip>:8080
curl -I https://pypi.yourdomain.com

Application Debugging

# Check PyPI Server process
kubectl exec -it <pod-name> -- ps aux

# Check listening ports
kubectl exec -it <pod-name> -- netstat -tlnp

# Check environment variables
kubectl exec -it <pod-name> -- env | grep PYPISERVER

# Check configuration
kubectl exec -it <pod-name> -- cat /data/.htpasswd

Health Check Issues

Liveness Probe Failures

Symptoms:

Pod restarts frequently
Health endpoint returns errors

Debug Commands:

# Test health endpoint
kubectl exec -it <pod-name> -- wget -O- http://localhost:8080/health

# Check probe configuration
kubectl get pod <pod-name> -o yaml | grep -A 10 livenessProbe

# Check application logs
kubectl logs <pod-name> --previous

Solutions:

# Adjust probe settings
livenessProbe:
  httpGet:
    path: /health
    port: http
  initialDelaySeconds: 60
  periodSeconds: 30
  timeoutSeconds: 10
  failureThreshold: 3

Readiness Probe Failures

Symptoms:

Service endpoints not ready
Traffic not routed to pod

Debug Commands:

# Check readiness status
kubectl get pods -o wide

# Test readiness endpoint
kubectl exec -it <pod-name> -- wget -O- http://localhost:8080/health

# Check probe configuration
kubectl get pod <pod-name> -o yaml | grep -A 10 readinessProbe

Resource Issues

Memory Issues

Symptoms:

OOMKilled errors
High memory usage

Debug Commands:

# Check memory usage
kubectl top pod <pod-name>

# Check memory limits
kubectl get pod <pod-name> -o yaml | grep -A 5 resources

# Check memory stats
kubectl exec -it <pod-name> -- cat /proc/meminfo

Solutions:

# Increase memory limits
resources:
  limits:
    memory: 4Gi
  requests:
    memory: 2Gi

CPU Issues

Symptoms:

Slow response times
High CPU usage

Debug Commands:

# Check CPU usage
kubectl top pod <pod-name>

# Check CPU limits
kubectl get pod <pod-name> -o yaml | grep -A 5 resources

# Check CPU stats
kubectl exec -it <pod-name> -- cat /proc/cpuinfo

Configuration Issues

Values File Problems

Symptoms:

Helm install/upgrade fails
Unexpected behavior

Debug Commands:

# Validate values file
helm template . -f values.yaml --dry-run

# Check rendered templates
helm template . -f values.yaml

# Validate against schema
helm template . -f values.yaml --validate

Environment Variable Issues

Symptoms:

Application not using expected configuration
Missing environment variables

Debug Commands:

# Check environment variables
kubectl exec -it <pod-name> -- env

# Check specific variable
kubectl exec -it <pod-name> -- echo $PYPISERVER_PORT

# Check deployment configuration
kubectl get deployment <deployment-name> -o yaml | grep -A 10 env

Recovery Procedures

Pod Recovery

# Restart deployment
kubectl rollout restart deployment <deployment-name>

# Check rollout status
kubectl rollout status deployment <deployment-name>

# Rollback to previous version
kubectl rollout undo deployment <deployment-name>

Data Recovery

# Create backup before troubleshooting
kubectl exec -it <pod-name> -- tar -czf /tmp/backup.tar.gz /data/packages/

# Copy backup from pod
kubectl cp <pod-name>:/tmp/backup.tar.gz ./backup.tar.gz

# Restore data if needed
kubectl cp ./backup.tar.gz <new-pod-name>:/tmp/
kubectl exec -it <new-pod-name> -- tar -xzf /tmp/backup.tar.gz -C /

Service Recovery

# Restart service
kubectl delete svc <service-name>
kubectl apply -f service.yaml

# Check service endpoints
kubectl get endpoints <service-name>

# Test service connectivity
kubectl run test-pod --image=busybox --rm -it --restart=Never -- wget -O- http://<service-name>:8080

Monitoring and Alerting

Set Up Monitoring

# Add monitoring annotations
podAnnotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "8080"
  prometheus.io/path: "/metrics"

# Create ServiceMonitor for Prometheus
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: pypi-monitor
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: pypiserver
  endpoints:
    - port: http
      path: /metrics

Common Alerts

# Example Prometheus alert rules
groups:
  - name: pypi-server
    rules:
      - alert: PyPIServerDown
        expr: up{app="pypiserver"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "PyPI Server is down"

      - alert: PyPIServerHighMemory
        expr: container_memory_usage_bytes{container="pypiserver"} > 3e9
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "PyPI Server memory usage is high"

Next Steps

Review configuration for proper setup
Learn about storage configuration
Explore advanced configuration options
Check the usage guide for deployment examples

Common Issues​

1. Pod Startup Issues​

Pod Stuck in Pending State​

Pod CrashLoopBackOff​

2. Storage Issues​

Permission Denied Errors​

Storage Full​

3. Network Issues​

Service Not Accessible​

Ingress Issues​

4. Authentication Issues​

Password File Problems​

5. Performance Issues​

Slow Package Uploads/Downloads​

Debugging Commands​

General Debugging​

Storage Debugging​

Network Debugging​

Application Debugging​

Health Check Issues​

Liveness Probe Failures​

Readiness Probe Failures​

Resource Issues​

Memory Issues​

CPU Issues​

Configuration Issues​

Values File Problems​

Environment Variable Issues​

Recovery Procedures​

Pod Recovery​

Data Recovery​

Service Recovery​

Monitoring and Alerting​

Set Up Monitoring​

Common Alerts​

Next Steps​

Common Issues

1. Pod Startup Issues

Pod Stuck in Pending State

Pod CrashLoopBackOff

2. Storage Issues

Permission Denied Errors

Storage Full

3. Network Issues

Service Not Accessible

Ingress Issues

4. Authentication Issues

Password File Problems

5. Performance Issues

Slow Package Uploads/Downloads

Debugging Commands

General Debugging

Storage Debugging

Network Debugging

Application Debugging

Health Check Issues

Liveness Probe Failures

Readiness Probe Failures

Resource Issues

Memory Issues

CPU Issues

Configuration Issues

Values File Problems

Environment Variable Issues

Recovery Procedures

Pod Recovery

Data Recovery

Service Recovery

Monitoring and Alerting

Set Up Monitoring

Common Alerts

Next Steps